4 Easy Ways to Instantly Improve Your Data Visualisations
Creating effective data visualisations is a critical skill within many disciplines, be it business analysis, data science or geoscience. Aesthetically pleasing and easy-to-understand data visualisations can help spark ideas in our target audience or get them to take action based on the information we display.
Within the Python world, there are several data visualisation libraries available. However, many learners of Python and data science start out with matplotlib.
Matplotlib provides a versatile way to present your data however you want. In my previous articles, I have shown various data visualisations that go a few steps beyond the default charts.
However, creating these figures does require patience and extra Python code. This often results in much searching on StackOverflow or through the library documentation to find possible solutions to modify even the smallest parts of the figures.
By following a few simple guidelines, we can immediately improve any figure created with matplotlib.
In this article, I share four of my favourite guidelines I regularly use when creating figures for sharing on Medium or in academic publications.
These guidelines are not necessarily restricted to matplotlib; they can equally be applied to any software that allows you to create charts, such as Excel or Tableau.
Remove Chart Junk and Keeping It Simple
One of the quickest and easiest ways to improve matplotlib charts is to reduce the amount of "chart junk" displayed.
Chart junk refers to the unnecessary and confusing elements on the chart that don't really add any value to the reader or the data being presented.
When building your chart, you should ensure you only include elements that help the reader understand the data better.
Here are a few things you can do to make your figures clearer:
- Using titles and labels sparingly but effectively
- Avoid complex vocabulary and jargon
- Remove unnecessary gridlines and borders
- Remove background images
- Avoid overly ornate fonts
- Avoid using unnecessary special effects such as 3D effects and shadows
As an example, we have the following chart, which is used to illustrate income vs. age. This chart has several elements that make it difficult to read and interpret, such as gridlines, point labels and colour clashes between the background and points.
If we spend a little time removing this unnecessary chart junk, we could end up with a figure like the one below.
It is much cleaner and easier for the reader to look at and interpret. We have also added a linear regression line, which can help show the overall trend in the data.
Here is another example which illustrates this process of removing excess chart junk.
One of the keys to creating effective charts is to let the data speak for itself. This means prioritising how the data is displayed above all other elements on the chart to clearly convey the intended insights from the Data Analysis.
Choose Appropriate Colours
Choosing appropriate colours for a chart may appear to be a simple task. However, it can very quickly become a huge time sink, and you may end up spending hours deciding on the right shades of blue or whether there is enough contrast between the colours you have chosen.
Selecting the right and most appropriate colours can heavily influence the readability of a chart and, therefore, influence how readers can interpret that data.
For example, in the following chart, we have five different categories all represented by a single colour. Whilst the plot is readable, the lack of effective colour use means we are not drawing the reader's attention to any particular aspect of the data.
If we change the colouring so that Category C is highlighted in orange, we immediately grab the reader's attention and suggest that this particular category is important.
If, on the other hand, we go the opposite way and use a random colour for every category, we can end up with a busy and confusing-looking chart. However, there may be cases where you want to use different colours for each bar, for example, when distinguishing between different company branding.
When picking colours, there are many nuances, which will depend on numerous factors, including the type of chart, the data and the message you are trying to convey.
However, there are a few general rules that can help make your figures appear more polished and professional:
- Use colour to highlight information, not distract: Colours should be used to draw attention to the most crucial aspects of your data.
- Be Consistent: When creating multiple visualisations, maintaining consistency helps your audience quickly understand new visualisations based on experience with previous ones. For example, if you are using blue for a particular category in one chart, try to use the same colour for the same category in other charts.
- Consider Colour Vision Issues / Blindness: It is important to consider people with colour blindness when creating your charts. For example, avoid colours that are known to be problematic, such as red and green or blue and yellow.
- Understand Colour Psychology: The meaning behind colours can have important implications and can also vary between cultures. For example, red is often seen as a negative colour or a warning of danger, whereas green is seen as a positive or an indicator of growth.
I would highly recommend checking out the following articles for a more in-depth look at understanding colour choices:
- How to choose colors for data visualizations
- How to pick more beautiful colors for your data visualisations
- Common Pitfalls of Color Use
There are also multiple colour palette generators out there that can help select the most appropriate palette. By using these tools, you can save a huge amount of time and ensure you achieve maximum readability, especially for those with colour vision issues.
Here are just a few that you should check out:
Save Time and Code By Applying a Matplotlib Theme
If you are a regular reader of my articles, you will have seen I have covered several matplotlib theme libraries in recent months. These theme libraries allow you to instantly transform your figures from the boring standard colour scheme of Matplotlib into something that is much more aesthetically pleasing.
Not only do they help with how the figures look, but they can also help improve interpretability.
There are numerous matplotlib theme libraries available, including mplcyberpunk, which lets you transform your matplotlib figure into a futuristic graph with glowing neon colours.
For example, to create a cyberpunk-themed image with mplcyberpunk, we can use the following code:
import mplcyberpunk
import numpy as np
# Generate x values
x = np.linspace(0, 10, 20)
# Generate y values
y = np.sin(x)
y2 = np.cos(x)
plt.style.use('cyberpunk')
plt.figure(figsize = (8,8))
plt.plot(x, y, marker = 'o')
plt.plot(x, y2, marker = 'o', c='lime')
mplcyberpunk.add_gradient_fill(alpha_gradientglow=0.5, gradient_start='zero')
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.show()
To generate the following image:
Even though the mplcyberpunk theme creates eye-catching figures, it should be used cautiously. It may not be seen as professional by some people, and it could potentially obscure the data and the message you are trying to convey.
If you are looking for something suitable for inclusion within an academic publication, then the SciencePlots library may be of interest.
The SciencePlots library contains numerous styles, which make it easy to set up figures if you are writing scientific or journal articles. It also includes support for multiple languages, including Chinese and Japanese.
For example, the figure below shows how the same data we used in the Cyberpunk-themed chart would look in a form suitable for inclusion within an academic journal article.
You can find out more about the SciencePlots library in my article below:
Creating Scientific Plots the Easy Way With scienceplots and matplotlib
If you are interested in an overview of some of the common theme libraries I have mentioned above, you may be interested in checking out my other article below.
Upgrade Your Data Visualisations: 4 Python Libraries to Enhance Your Matplotlib Charts
Consider Your Audience and the Story You Are Telling
When creating data visualisations, one of the most important things to keep in mind is who your audience is and the story that you want to tell.
Instead of presenting all the available data to the user in a large range of confusing and complex charts, it is best to distil the data and information down to the most relevant parts. This will depend on what the objective of the data analysis is, which could be defined by a client, a research project or an event organiser.
Additionally, we need to account for our audience's background.
Are they technical-minded people with high data literacy, or should our visualisations be simplified for a wider non-technical audience?
For example, if we have been tasked with presenting information about average porosity values from the Hugin Formation obtained from a series of wells from the Norwegian Continental Shelf.
Our first attempt might be something like the following:
fig, ax = plt.subplots(figsize=(8,8))
bars = plt.barh(df['well'], df['porosity'])
plt.show()
Which would create the following bar chart.
However, when readers look at this, they have to do a lot of work to figure out what is happening.
They will be asking:
- What do the bars represent?
- What well has the highest porosity?
- Is there a difference between well 16/10–1 and well 25/8–7?
- What wells are considered to have a "good" porosity?
Trying to answer these questions takes a fair bit of effort, and most people will skip the figure and move on.
If we try to improve our figure and answer these questions without too much effort on the reader's part, we can end up with something like this.
Right away, we can answer the questions:
-
What do the bars represent? Average porosity values in the Hugin Formation
-
What well has the highest porosity? 25/8–7 at 26.1%
-
Is there a difference between well 16/10–1 and well 25/8–7? Yes, a 0.1% difference
- What wells are considered to have a "good" porosity? Wells highlighted in orange that occur above a 20% cutoff value
We have made the reader's work much easier by simplifying the figure and improving the aesthetics.
We can also change the narrative of the figure by highlighting one bar. For instance, the chart below may be part of a larger infographic about well 16/2–16, and highlighting that bar, we immediately draw attention to it.
Summary
Creating effective data visualisations is a high-quality skill that should be learned, especially if you are involved in data science or data analytics.
Within this article, I have shared four of my favourite guiding principles for creating effective visualisations. There are many more methods out there that can be used to improve figures.
It would be great to hear about your favourite rules for creating effective data visualisations in the comment section of this article.
Datasets Used in this Article
Training dataset used as part of a Machine Learning competition run by Xeek and FORCE 2020 (Bormann et al., 2020). This dataset is licensed under Creative Commons Attribution 4.0 International.
The full dataset can be accessed at the following link: https://doi.org/10.5281/zenodo.4351155.
Thanks for reading. Before you go, you should definitely subscribe to my content and get my articles in your inbox. You can do that here!
Secondly, you can get the full Medium experience and support thousands of other writers and me by signing up for a membership. It only costs you $5 a month, and you have full access to all of the fantastic Medium articles, as well as the chance to make money with your writing.
If you sign up using my link, you will support me directly with a portion of your fee, and it won't cost you more. If you do so, thank you so much for your support.