Using Jitter to Avoid Over Plotting in Power BI

Philip Treacy

January 11, 2020

If you need to plot data that has one variable where values can be the same or very similar, for example the age of a group of people, you'll likely end up with data points that are plotted over the top of each other.

To make this type of plot easier to read and allow the reader to gain better understanding of the data, we can use jitter.

Overplotted Data

over plotted data without jitter

Jittered Data

scatter plot with jitter

Jitter means adding a small movement to the plotted point to make all the points easier to see. In this case we can move the points a little to the left and right.

The underlying data isn't changed, just the plotted point's position.

Using Jitter in Power BI

There are a few options here. You could use Excel to create another column in your data with the jittered values before loading your dataset.

Or you could use Power Query/DAX to calculate the jittered values once data is in Power BI.

Both approaches require you to know beforehand that you need to jitter the data, and to do extra work to calculate this jitter. But if you use some native Python visualizations, this work is done for you and it's easy to turn jittering on and off.

Download PBIX File and Dataset

Enter your email address below to download the sample file and data.

By submitting your email address you agree that we can email you our Excel newsletter.

Python Visualizations in Power BI

Power BI supports the use of Python to create visualizations so this is very useful if Power BI doesn't already support the type of visualization you want to use or if you can't find a good custom visual to meet your needs.

Actually using a Python chart isn't as complicated as you may think. You follow the usual steps to get your data into Power BI then drag the fields into the Values area. A few simple modifications to the Python code and that's all you need.

But before we get into Power BI, you need to make sure you have Python installed on your computer so that you can write the code in Power BI Desktop.

You can download and install Python from the main Python.org page.

Creating the Plot

With Python installed the first thing you want to do after starting Power BI Desktop is load the dataset from a CSV file.

I'm using a dataset that shows, amongst other things, the total bill for meals at a restaurant over a four day period, Thu - Sun.

With the dataset loaded, click on the Python visual icon

Python visual icon in Power BI

Power BI will ask you to enable script visuals so click on the Enable button.

enable script visuals

Now drag across the day and total_bill values and set them both to Don't summarize.

Don't summarize values in Power BI

When you drag fields into the values area, you'll see the Python script editor appear at the bottom of the window, this is where the visualization code goes.

When your Python code is ready, you click on the Run script button to draw the visualization.

Python Script Editor - Run script button

Python in Power BI works with a data structure called a DataFrame and this is automatically created for you. Think of the DataFrame as a table. PBI names this DataFrame dataset and it contains the data in the Values area.

If you have a field called day, the Python script accesses the data in the day column by using the term dataset['day'].

For this visual I'm using a Strip Plot from the Seaborn visualization library. A Strip Plot is essentially a scatter chart for categorized data. Along the x axis are the days (the category) and on the y axis is the bill amount. The code to draw this is just

Seaborn code without jitter

We end up with a chart like this

Strip plot without jitter

You can see that because we have so many data points of similar value we get overplotting. This doesn't give a good feel for the frequency or distribution of the data.

Adding Jitter

To add some jitter just make the jitter parameter True

Seaborn code using jitter

and you end up with this plot

Strip plot with jitter

It's now easier to see how many data points we have, but we can improve things further by making the points a bit bigger and by making each marker's outer edge white. This will let us see more clearly where points are still plotted over each other.

NOTE : Each time you replot the chart the jitter is recalculated, so the points will end up in different positions each time you run the script.

Our plotting code is now

Strip plot marker style

resulting in

Strip plot with jitter and edgecolor

The default for a strip plot is for jitter to be on, but you may not always want to use it, and for the sake of this example I'm explicitly turning it off and on to demonstrate its effect.

Summary

When you have data points plotted over the top of each other, jitter is useful to spread those points out and let's you understand the data better.

Download the sample PBIX file and dataset (above) and give it a go yourself.


AUTHOR Philip Treacy Co-Founder / Owner at My Online Training Hub

Systems Engineer with 30+ years working for companies like Credit Suisse and E.D.S. in roles as varied as Network & Server Support, Team Leader and Consultant Project Manager.

These days Philip does a lot of programming in VBA for Excel, as well as PHP, JavaScript and HTML/CSS for web development.

He's particularly keen on Power Query where he writes a lot of M code.

When not writing blog posts or programming for My Online Training Hub, Philip can be found answering questions on the Microsoft Power BI Community forums where he is a Super User.

Leave a Comment

Current ye@r *