Charts that Tell a Story: Turn a Plotly Visualization into Something More

Author:Murphy | View: 27499 | Time: 2025-03-23 19:09:42

Data visualizations convey ideas that dataframes and tables cannot. But effectively telling a story through Data Visualization requires an aesthetically pleasing, interpretable chart that provides the context necessary for the visualization to stand on its own.

Fortunately, Python contains numerous data visualization libraries, such as Plotly Express, that quickly create charts via a single line of code [1]. While useful, such charts can rarely stand alone in a formal publication or survive scrutiny without further context provided; a professional, standalone chart that tells a story through data requires a little extra work. This article walks through the steps to elevate a data visualization to the next level.

Code:

Code for this walkthrough is available at the GitHub page linked here. Feel free to download the code and follow along in Jupyter notebook – click "code" and "Download ZIP" to get the ipynb file.

1. Data Preparation and Initial Visualization

The libraries used are:

# Data Handling
import pandas as pd

# Data visualization Libraries
import seaborn as sns
import Plotly.express as px
import plotly.io as pio

Returning to an old favorite dataset, the Seaborn library‘s automotive dataset (labeled ‘mpg') provides great data on vehicle fuel consumption during the energy crisis era [2]. This data, via use of the groupby function, will provide average vehicle miles per gallon (MPG) by year from 1970 to 1982. Here is the code to load and prepare the dataframe:

# Load in data:
mpg = sns.load_dataset('mpg')

# Get dataset showing average MPG per year by using groupby:
mpg = mpg.groupby(['model_year'])['mpg'].mean().to_frame().reset_index()

# Rename columns:
mpg = mpg.rename(columns={'model_year': 'Year',
                          'mpg': 'Average MPG'})
mpg.head()

The dataframe head should look like this:

One line of code in Plotly Express gives the following bar chart:

px.bar(mpg, x='Year', y='Average MPG')

Only a few lines of code were necessary to prepare and visualize the data. But would such a visualization be worthy of publication in a magazine or formal business report? Probably not.

2. The Next Level: Labels, Formats, Colors

First, let's look at color schemes for the chart. Running the following code updates the color value and template:

# Generate base plot:
plot = px.bar(mpg, x='Year', y='Average MPG', color='Average MPG',
              color_continuous_scale=px.colors.diverging.RdYlGn)

# Remove colorbar:
plot.update_coloraxes(showscale=False)

# Update plotly style:
plot.update_layout(template='plotly_white')

plot.show()

The updated chart now appears like this:

The first change in the above code was adding the ‘Average MPG' column to color. While technically redundant (the ‘Average MPG' is already captured in the y-value, or bar height), specifying the color as ‘Average MPG' allows application of a color scale to aid in quick interpretation. In this case, the Plotly diverging color scale "RdYlGn" maps nicely to the data: less efficient years are red, while more efficient years are green, and the in between are orange and yellow. Discover other Plotly color schemes here [3].

Because the color and y-values are the same, the colorbar that Plotly would normally include is redundant; the above code eliminates it. Finally, the code changes the default Plotly theme to ‘plotly_white'. Read more about Plotly themes here [4].

Note: If considering colorblind-friendly schemes, read this Towards Data Science Article on how to adjust color schemes [5]. For this article, the Red-Yellow-Green is retained both as a proof of concept as well as the fact that the MPG information is still conveyed by the y-axis (bar height) value.

3. Labels

The next steps label the axes:

# Label axes:
plot.update_xaxes(title='Model Year',
                  dtick=1)
plot.update_yaxes(title='Average Miles Per Gallon (MPG)')

Plotly will normally default axis titles to the column title of the data in the specified axes, but update_xaxes() and update_yaxes() allows further customization [6]. Note that MPG is spelled out as "Miles Per Gallon" to avoid confusing the viewer with acronyms. Additionally, for the x-axis, the "dtick" value is set to 1; this ensures each bar along the x-axis has a model year value placed below it.

The following code updates the title:

# Update plot layout:
plot.update_layout(
    title=dict(
        text='Average Miles Per Gallon of Cars Over Time
              
^{A Visualization of Improvements in 
              Fuel Efficiency During the Energy Crisis Era}',
        x=0.085,
        y=0.95,
        font=dict(
            family='Helvetica',
            size=25,
            color='#272b4f'
        )))

The function update_layout() allows for the addition of a title [7]. Note the text incorporates HTML-style code: and create bold text for the contents within; and do the same for italicizing;
represents a page break; and lastly, ^{represents a superscript for the chart's subtitle. The between text lines in the title allows the text block to continue on the next line of code, creating cleaner code blocks.}

The font is also adjustable, as is the font's color. Find other colors easily with Google's color picker [8].

Finally, we use the add_annotation() function to annotate the data source on our chart [9]:

# Add annotation on data source:
plot.add_annotation(x=0,
                    y=-0.15,
                    showarrow=False,
                    text="Fuel mileage data courtesy of 
                          Python Seaborn Library",
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='#a6aeba')

All of this results in the following:

Note how the chart has a title and subtitle that introduce the context for the data, while the axes provide unambiguous labels and the bars' height and color quickly show each year's average MPG. Additionally, there is a data source reference at the bottom. This could be a good stopping point for a chart headed to publication but, to show how far a visualization can go, the next section will introduce some more options.

4. Add Annotations

To add more context to the chart, the following code creates a horizontal line representing the average MPG from 1970 to 1982 as well as an annotation box describing the line:

# Add average MPG across era:
plot.add_hline(y=mpg['Average MPG'].mean())

# Add explanation of line:
plot.add_annotation(x=.05,
                    y=0.67,
                    text="Average MPG, 1970 through 1982",
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='black',
                    bordercolor='black',
                    borderpad=5,
                    showarrow=True,
                    arrowhead=2,
                    bgcolor='white',
                    arrowside='end'
                    )

When added to the chart, this yields:

Note how the add_annotation() function allows the creation of a border (bordercolor='black') and an arrow. The x and y values in the add_annotation() function position the callout and may require some trial and error to get the box at the desired spot.

This line and callout box help further emphasize that MPG improved significantly in years 1980, 1981, and 1982. Additionally, it shows that from 1975 to 1979, average MPG showed consistent improvement year over year. Suppose it is necessary to highlight this period of improvement; it can be done with the following code:

# Add highlight box:
plot.add_vrect(x0="74.5",
               x1="79.5",
               fillcolor="lightgray",
               opacity=0.3,
               line_width=0)

# Add explanation of line:
plot.add_annotation(x=.45,
                    y=0.9,
                    text="Period of Consistent Improvement
                         
until Breakthrough in 1980's",
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='black',
                    showarrow=False,
                    )

The function add_vrect() creates a rectangular box that can highlight a specific section of the graph [10]; in this case, it highlights the period of consistent improvement in the late 1970's. Placement of the add_vrect() code matters; placing it after the add_hline() function means it will be over the horizontal line versus under as seen below:

It may seem like there's nothing left to add, but there is one more possibility: an explanation for the rise in fuel economy. Let's assume the chart is part of a study that found decreasing engine sizes directly contributed to improvements in MPG. Fortunately, the Seaborn MPG data includes engine displacement data. With some data preparation below as well as some new annotations, the final piece is ready for the chart:

# Data prep:
displacement = sns.load_dataset('mpg')
seventies = round(
    displacement[displacement['model_year'] < 80]['displacement'].mean(), 2)
eighties = round(
    displacement[displacement['model_year'] >= 80]['displacement'].mean(), 2)

# Create text string:
explanation = "Why the Improvement in MPG? 

               In the 70's, average engine size was {} 

               cubic inches versus {} from 1980 to 1982.

               Larger engines are usually less efficient.".format(seventies, eighties)

# Add explanation for trends:
plot.add_annotation(x=.615,
                    y=0.02,
                    text=explanation,
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='black',
                    bordercolor='black',
                    borderpad=5,
                    bgcolor='white',
                    showarrow=False
                    )

The data preparation section calculates the average engine displacement in the 1970s (70 through 79) as well as the three years from the eighties (80, 81, and 82). This is then passed into a text string that is used for the text value in the add_annotation() function. The final chart appears as follows:

This chart can now stand alone and tell the story of average yearly fuel mileage during the energy crisis era. Some final considerations:

Less is generally more: stop adding to a chart when it's reached its ability to convey the necessary information given the context the chart is presented in.
A standalone infographic might need more items, while a chart displayed during an oral presentation or as part of a written report would suffice with fewer items.
Color schemes, font choices, and sizes impact readability and accessibility.
If the customer cannot understand the mesage of a chart, it's not the customer's fault.

5. Conclusion

Python's various visualization libraries, to include Plotly Express, offer a quick way to generate highly customizable charts that can range from a basic visualization to a fully customized standalone product. Building out visualizations with an understanding of how to best convey the story results in the most effective messaging. Experiment with the code yourself and feel free to download the full notebook at the GitHub page.

Generate The Final Chart in a Single Block of Code:

# Load in Libraries:

# Data Handling
import pandas as pd

# Data visualization Libraries
import seaborn as sns
import plotly.express as px

# Load in data:
mpg = sns.load_dataset('mpg')
mpg.head()

# Get dataset showing average MPG per year by using groupby:
mpg = mpg.groupby(['model_year'])['mpg'].mean().to_frame().reset_index()

# Rename columns:
mpg = mpg.rename(columns={'model_year': 'Year',
                          'mpg': 'Average MPG'})

# Generate base plot:
plot = px.bar(mpg, x='Year', y='Average MPG', color='Average MPG',
              color_continuous_scale=px.colors.diverging.RdYlGn)

# Remove colorbar:
plot.update_coloraxes(showscale=False)

# Update plotly style:
plot.update_layout(template='plotly_white')

# Label axes:
plot.update_xaxes(title='Model Year',
                  dtick=1)
plot.update_yaxes(title='Average Miles Per Gallon (MPG)')

# Add labels and source:

# Update plot layout:
plot.update_layout(
    title=dict(
        text='Average Miles Per Gallon of Cars Over Time
              
^{A Visualization of Improvements in 
              Fuel Efficiency During the Energy Crisis Era}',
        x=0.085,
        y=0.95,
        font=dict(
            family='Helvetica',
            size=25,
            color='#272b4f'
        )))

# Add annotation on data source:
plot.add_annotation(x=0,
                    y=-0.15,
                    showarrow=False,
                    text="Fuel mileage data courtesy of 
                          Python Seaborn Library",
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='#a6aeba')

# Add highlight box:
plot.add_vrect(x0="74.5",
               x1="79.5",
               fillcolor="lightgray",
               opacity=0.3,
               line_width=0)

# Add explanation of line:
plot.add_annotation(x=.45,
                    y=0.9,
                    text="Period of Consistent Improvement
                         
until Breakthrough in 1980's",
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='black',
                    showarrow=False,
                    )

# Add average MPG across era

# Create Line:
plot.add_hline(y=mpg['Average MPG'].mean())

# Add explanation of line:
plot.add_annotation(x=.05,
                    y=0.67,
                    text="Average MPG, 1970 through 1982",
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='black',
                    bordercolor='black',
                    borderpad=5,
                    showarrow=True,
                    arrowhead=2,
                    bgcolor='white',
                    arrowside='end'
                    )

# Add a box to explain the trends

# Data prep:
displacement = sns.load_dataset('mpg')
seventies = round(
    displacement[displacement['model_year'] < 80]['displacement'].mean(), 2)
eighties = round(
    displacement[displacement['model_year'] >= 80]['displacement'].mean(), 2)

# Create text string:
explanation = "Why the Improvement in MPG? 

               In the 70's, average engine size was {} 

               cubic inches versus {} from 1980 to 1982.

               Larger engines are usually less efficient.".format(seventies, eighties)

# Add explanation for trends:
plot.add_annotation(x=.615,
                    y=0.02,
                    text=explanation,
                    textangle=0,
                    xanchor='left',
                    xref="paper",
                    yref="paper",
                    font_color='black',
                    bordercolor='black',
                    borderpad=5,
                    bgcolor='white',
                    showarrow=False
                    )

plot.show()