Building Interactive Data Visualizations with Python - The Art of Storytelling

Author:Murphy  |  View: 28390  |  Time: 2025-03-23 18:27:40

Intro Guide

 

Mastering the art of storytelling is important for data scientists, but especially crucial for data analysts.

Sharing the data insights and highlights with people unfamiliar with it, who may not even come from a technical background, is one of the most important parts of the data analysis journey.

They don't care you are the best at cleaning and transforming the data if they don't understand or aren't engaged with what you're saying.

Visualizations are therefore part of this final storytelling and it's arguably the best way to showcase any facts – that's why everybody uses them.

Furthermore, tools like Tableau or Power BI are on the rise thanks to their ability to create interactive dashboards easily. Business people are usually amazed by cool dashboards with graphs and colors, so they've started to put it as a requirement on their job offers.

Today, I'll share with you some Python options to create interactive visualizations for those who can't or don't like/want to use these specific data visualization tools mentioned above.

You'll no longer have to leave Python behind after all your analysis – you'll also be able to use it to share your insights visually.

More concretely, I'll be talking about four libraries: Seaborn, Bokeh, Plotly, and Dash.

Because going too deep on all of them might result in an extremely long post, I'll try to keep it relatively brief and focus on showing a sneak peek of what they can do. In future posts I might go more in detail into some of those (or all) but one at a time.

Today's table of contents:

  1. First Things First – The Data
  2. Seaborn
  3. Bokeh
  4. Plotly
  5. Dash
  6. Telling the Story

You'll be able to find a link to the whole code at the end of this post, in the Resources section.

Let's start then.

1. First Things First – The Data

There's no way we can tell a story if we don't have a story to tell, right?

As I love the nutrition and health sector, I'll be using publicly available data from the World Health Organization (WHO). Concretely, Prevalence of overweight among adults, BMI >= 25 (age-standardized estimate) (%)[1].

Our aim is to analyze the data and see what we can find. It contains prevalence values by gender, country, and year from 1975 until 2016.

Disclaimer: The results won't surprise you because this is a widely talked-about topic. But we're not here to discover new stuff today.

I'll be using a notebook for simplicity, but have in mind that the plots we're about to create can also be embedded into websites – depending on the library we use – which is really where the good part is.

I cleaned the data a little bit to make it simple and let it be ready for what's coming next. Here's the snippet in case you want to follow the same steps:

import pandas as pd # Load the data df = pd.read_csv("data.csv") # Remove missing data and keep only useful columns df = df.loc[     df['Value'] != "No data",      ['ParentLocation', 'Location', 'Period', 'Dim1', 'Value'] ] # Rename columns df.columns = ['ParentLocation', 'Location', 'Period', 'Gender', 'Prevalence'] # Some values in Prevalence contain ranges (like '13.4 [8.7 – 18.9]'), so we just keep the average df['Prevalence'] = df['Prevalence'].apply(     lambda x: float(x.split('[')[0]) if '[' in x else x )

And here's what it looks like:

 

We're ready to start plotting.

2. Seaborn

I'm sure you're familiar with Seaborn already. It's one of the most used libraries used for plotting alongside Matplotlib. What you maybe didn't know is that we can use it for dynamic visuals as well, not just static ones.

But let me introduce it just in case it's new to anyone.

Using their own words: "Seaborn is a Python Data Visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics."[2]

So it's like Matplotlib's improved version.

And I'll start by using a simple relplot() to plot the relationship between different variables:

import seaborn as sns  # Apply the style sns.set_style("ticks") sns.despine()  # Prepare df for this graph grouped_df = df.groupby(     ['ParentLocation', 'Gender', 'Period'] )['Prevalence'].mean().reset_index()  # Create visualization sns.relplot(     data=grouped_df[         grouped_df['Gender'] != 'Both sexes'     ],     kind="line",     x="Period",      y="Prevalence",      col="Gender",     hue="ParentLocation",      palette="Paired" )

 

The visualization per se is not what's important here. We'll draw some insights in the last section of this post when we simulate storytelling.

What we have is a static plot. We can't change anything if it's not by changing the code, and that makes the plot static. It's fine if we want to compare on a ParentLocation basis, but we can't get granular on a country level. Doing so would result in a lot of lines within the same confusing chart.

This isn't what we came for.

But Seaborn doesn't allow us to implement interactive dashboards… However, we can use ipywidgets to improve interactability.

If the next snippet is not working on your system, it's because you need to enable the widgets extension by running the next command in your terminal jupyter nbextension enable --py --sys-prefix widgetsnbextension

We'll now plot the same relplot() but now we want to compare countries in an interactable way. Here's how you can do it:

import ipywidgets as widgets  countries = sorted(pd.unique(df['Location']))  # Creates Dropdown Widget def create_dd(desc, i=0):     dd = widgets.Dropdown(         options=countries,          value=countries[i],          description=desc     )     return dd  # Creates the relplot def draw_relplot(country1, country2):     sns.relplot(         data=df[             (df['Location'].isin([country1, country2]))             & (df['Gender'] != 'Both sexes')         ],         kind="line",         x="Period",          y="Prevalence",          col="Gender",         hue="Location",          palette="Paired"     )  # Generate the final widget dd1 = create_dd('Country 1', 0) dd2 = create_dd('Country 2', 1) ui = widgets.HBox([dd1, dd2])  # Create the interactive plot and display out = widgets.interactive_output(     draw_relplot,      {'country1': dd1, 'country2': dd2} )  display(ui, out)

This was longer. But all we did was wrap some code into two functions – one to create a dropdown and the other to create the plot itself – and then just create the UI by using an HBox widget, which is a wrapper that groups the passed widgets. We then use the interactive_output function to create the output and display everything.

The result:

 

We've added some interaction and dynamism here! We can already compare two countries at a time and see the differences in their overweight prevalence.

Things are getting better.

However, this approach has a limitation. The plots aren't interactable, we just have dynamic data being shown. Is that all we want? Maybe at some point, but not today.

Introducing our next new friend…

3. Bokeh

Using an official statement:

"With just a few lines of Python code, Bokeh enables you to create interactive, JavaScript-powered visualizations displayable in a web browser."[3]

Bokeh is a way more complex and complete option for interactable graphs and dashboards. We can literally create Tableau-like dashboards with Bokeh.

The con is that it's not the most user-friendly option. Even the installation can be quite painful for beginners trying to use it on a Jupyter Notebook.

By default, the plots and dashboards we create are rendered in an HTML page or file, but we can render them inline within our notebook by importing and calling the output_notebook() function.

Let's try to replicate the first plot we created with Seaborn, but now using Bokeh and generating an interactable graph:

from bokeh.plotting import figure from bokeh.io import show, output_notebook from bokeh.layouts import row from bokeh.palettes import Paired  def prepare_figure(gender):     l = figure(         title=f"Gender = {gender}",          x_axis_label='Period',          y_axis_label='Prevalence',         width=475,          outer_width=475,         height=500,     )     for i, loc in enumerate(pd.unique(grouped_df['ParentLocation'])):         l.line(             grouped_df[                 (grouped_df['Gender'] == gender)                  & (grouped_df['ParentLocation']==loc)             ]['Period'],              grouped_df[                 (grouped_df['Gender'] == gender)                  & (grouped_df['ParentLocation']==loc)             ]['Prevalence'],              legend_label=loc,              line_width=2,             color=Paired[12][i]         )      l.legend.location = 'top_left'     l.legend.click_policy="mute"     l.legend.label_text_font_size='8px'     l.legend.background_fill_alpha = 0.4      return l  # Render the figure inline output_notebook()  # Create the first figure and input the data l1=prepare_figure('Male')  # Create the second figure and input the data l2=prepare_figure('Female')  p = row(l1, l2)  # Show the plot show(p)

 

Yay! Our first interactive data visualization. But this is still far from the dashboard concept.

Also, we don't have the ability to choose the data being displayed yet. We're just able to show/hide some lines and move around, and zoom in/out of the graphic. Cool, but we need more.

Even though Bokeh has its own widgets incorporated – with which we could create dropdowns among others – I'll leave it here.

Bokeh is certainly not beginner-friendly and I don't want to confuse anyone. That makes it, in my opinion, a worse alternative compared to the two that are coming next.

It's like wasting time and money talking about how good rice is for you when in reality you could focus on even healthier options like brown rice or even about other vegetables like broccoli.

On to the good part, then.

4. Plotly

Again, from Plotly's documentation:

Plotly Express is a built-in part of the plotly library, and is the recommended starting point for creating most common figures. […] The API for these functions was carefully designed to be as consistent and easy to learn as possible, making it easy to switch from a scatter plot to a bar chart to a histogram to a sunburst chart throughout a data exploration session.[4]

Plotly is an amazing tool I've used over and over again, and I can promise it's exactly how they say: easy to learn and easy to switch from one plot to the other.

However, we're not going to talk much about Plotly Express today because we're going to go a little bit further. We're going to use Plotly's graph objects to get things done.

We'll go straight into the code that allows us to create the country comparison plot, with the two dropdowns:

from plotly.subplots import make_subplots import plotly.graph_objects as go  # Define variables colors = ['#a6cee3', '#1f78b4']  # Define widgets (using previous function) dd1 = create_dd('Country 1', 0) dd2 = create_dd('Country 2', 1)  # Create figure and traces def create_figure(country1, countr     fig = make_subplots(         shared_xaxes=True,          shared_yaxes=True,          rows=1,          cols=2,         vertical_spacing = 0,         subplot_titles=("Gender = Female", "Gender = Male"),      )      for j, gender in enumerate(['Female', 'Male']):         for i, loc in enumerate([country1, country2]):             fig.add_trace(                 go.Scatter(                     x=df[                         (df['Gender'] == gender)                          & (df['Location'] == loc)                     ]['Period'],                      y=df[                         (df['Gender'] == gender)                          & (df['Location'] == loc)                     ]['Prevalence'],                      name=loc,                      line=go.scatter.Line(color=colors[i]),                      hovertemplate=None,                     showlegend=False if j==0 else True                 ),                  row=1,                  col=j+1              )      # Prettify     fig.update_xaxes(showspikes=True, spikemode="across")     fig.update_layout(         hovermode="x",         template='simple_white'     )      return fig  fig = create_figure(countries[0], countries[1])  # Create the Figure Widget g = go.FigureWidget(     data = fig,     layout=go.Layout(         barmode='overlay'     ) )  # Handle what to do when the DD value changes def response(change):     dfs = []     for gender in ['Female', 'Male']:         for loc in [dd1.value, dd2.value]:             dfs.append(                 df[(df['Gender'] == gender)                   & (df['Location'] == loc)]             )      x = [temp_df['Period'] for temp_df in dfs]     y = [temp_df['Prevalence'] for temp_df in dfs]      with g.batch_update():         for i in range(len(g.data)):             g.data[i].x = x[i]             g.data[i].y = y[i]             g.data[i].name = dd1.value if i%2 == 0 else dd2.value          g.layout.barmode = 'overlay'         g.layout.xaxis.title = 'Period'         g.layout.yaxis.title = 'Prevalence'  dd1.observe(response, names="value") dd2.observe(response, names="value")  container = widgets.HBox([dd1, dd2]) widgets.VBox([container, g])

It might seem long but it's mainly because I've formatted it to appear without the need of scrolling horizontally. If you closely look at it you'll see we're really doing very few things!

We're reusing parts of the previous snippets, such as the create_dd() function. Yes, we're using ipywidgets again but their integration into Plotly is extremely smooth. It's all handled through the observe() method they have, and the response() function we've coded, which is executed every time the value from any of both dropdowns changes.

And here's the visual result:

 

That was quite easy! Again, using Bokeh to achieve the same is possible but learning how to do it in Plotly is way faster. And the results are great either way.

On to our last friend…

Dash

Dash isn't a plotting library per se. It's an amazing framework used to generate dashboards. Its popularity has been increasing lately for building interactive web-based dashboards.

Spoiler: Dash was created by Plotly's developers. So here we'll still be using Plotly but now combined with the Dash framework to see what we can create with them.

So, how can we build the same plot but using Dash? Instead of putting the whole code in here as I've been doing, I'd like to divide it into chunks for better comprehension.

As I said at the beginning, I'll be writing separate posts to go more in-depth into these frameworks. So consider following me if you're interested.

We'll start with the imports. Take into account that we'll be reusing code that we've already seen so I'm not reimporting frameworks like Plotly.

# Install dash and jupyter_dash !pip install dash !pip install jupyter-dash  # Import import dash_core_components as dcc from dash import html from jupyter_dash import JupyterDash

Note that I imported JupyterDash, and that's because I'm using a Jupyter Notebook. If you're using a normal script, just replace the last row with from dash import Dash.

The next thing we do is create the app:

app = JupyterDash(__name__)

I hope you're familiar with HTML because we're now going to do a little bit of that but using html‘s module from Dash. I plan on creating a simple 3-column, 1-row grid, with sizes 15%, 60%, and 25%. The first column will be used for the dropdowns, the second for the chart, and the third will be empty.

The first column, then, will consist of a div container containing a title and another div with the dropdowns. This code creates it:

 html.Div([          html.H1('Countries'),          html.Div(             [dcc.Dropdown(                 id='country1_dropdown',                 options=df['Location'].unique().tolist(),                 value='',                 placeholder='Select a country'             ),              dcc.Dropdown(                  id='country2_dropdown',                  options=df['Location'].unique().tolist(),                  value='',                  placeholder='Select a country'             )]         )     ] )

It may seem messy at first but it is really simple. Each HTML element has its children within the list argument (in this case, the

wraps an

and another
. And the two dropdowns inside the inner
.

 

Let's now create the graph code:

html.Div([     dcc.Graph(         id='chart'     ) ])

This one was simpler. It's just a

container wrapping a graph with ID chart.

 

However, no dashboard is acceptable if there's no style applied to it. How do we add some CSS to these elements?

Easy, each element has an optional style parameter we can use to add our CSS. Here's how the last code snippet ends up with some styling:

html.Div([         dcc.Graph(             id='chart',              style={}         )     ], style={         'grid-column-start': 'second',         'grid-column-end': 'third',     } )

I already hear you saying: "Sure, we know how to create both elements. But how do we actually create the layout?" Simple, just put them all into a container

and assign it to app.layout:

 

app.layout = html.Div([     # Dropdown menu     html.Div([          html.H1('Countries'),          html.Div(             [dcc.Dropdown(                 id='country1_dropdown',                 options=df['Location'].unique().tolist(),                 value='',                 placeholder='Select a country',                 style={'margin-bottom': '10px', 'max-width': '200px'}             ),              dcc.Dropdown(                  id='country2_dropdown',                  options=df['Location'].unique().tolist(),                  value='',                  placeholder='Select a country',                  style={'max-width': '200px'}             )]         )     ], style={         'grid-column-start' : 'first',         'grid-column-end' : 'second',         'padding': '2%',         'justify-self': 'center'     }),      # Plot     html.Div([         dcc.Graph(             id='chart',              style={}         )     ], style={         'grid-column-start' : 'second',         'grid-column-end' : 'third',     }) ], style={     'width': '100vw'     'display': 'inline-grid',     'grid-template-columns': '[first] 15% [second] 60% [third] 25%',     'grid-template-rows': '[row] 100%',     'grid-gap': '1rem',     'align-items': 'right', })

If you tried to run this, you'd see empty graphs. We can add some data to them using callbacks. These callbacks have input and output dependencies, which are used to update the data being shown and add up to the interaction we're looking for.

You'll next see a function called update_chart(), which is essentially a copy-paste of the create_chart() function we've used in the Plotly section.

colors = ['#a6cee3', '#1f78b4']   @app.callback(     dash.dependencies.Output('chart', 'figure'),     dash.dependencies.Input('country1_dropdown', 'value'),     dash.dependencies.Input('country2_dropdown', 'value') ) def update_chart(country1, country2):         # Create figure and traces     fig = make_subplots(         shared_xaxes=True,          shared_yaxes=True,          rows=1,          cols=2,         vertical_spacing = 0,         subplot_titles=("Gender = Female", "Gender = Male"),      )      for j, gender in enumerate(['Female', 'Male']):         for i, loc in enumerate([country1, country2]):             fig.add_trace(                 go.Scatter(                     x=df[                         (df['Gender'] == gender)                          & (df['Location'] == loc)                     ]['Period'],                      y=df[                         (df['Gender'] == gender)                          & (df['Location'] == loc)                     ]['Prevalence'],                      name=loc,                      line=go.scatter.Line(color=colors[i]),                      hovertemplate=None,                     showlegend=False if j==0 else True                 ),                  row=1,                  col=j+1              )      # Prettify     fig.update_xaxes(showspikes=True, spikemode="across")     fig.update_layout(         hovermode="x",         template='simple_white',     )      return fig

So the new stuff here is what's above the function declaration. We say the callback depends on two inputs – country1 and country2 – and the output is a figure with id='chart'.

Finally, we run the app:

# Run app if __name__ == '__main__':     app.run_server()

And then navigate to http://127.0.0.1:8050 or whichever IP the server is running at and play with it just like I do here:

 

See how fluid it is. We can combine the power of HTML and Python's plots to create pretty dashboards, all thanks to Dash.

6. Telling the Story

All this was pretty introductory. We didn't build a dashboard nor tell a story. We just saw some tools to make graphs interactable.

In this section, I'll share with you a sample dashboard someone could build to tell a story to stakeholders. It won't be the prettiest nor the most complete one, but it'll be enough to show what Dash is capable of and convince you to start using it for your future visualizations.

Disclaimer: fall in love with the utility Dash and Plotly provide, not the design I've implemented. My goal was to showcase what Dash is capable of instead of actually trying to convince you that overweight is a real problem worldwide.

 

On to the insights now:

General basis

Obesity is commonly said to be a pandemic by a lot of scientists out there. Whether it fits the actual definition of a pandemic or not is irrelevant. What's relevant is the prevalence it has and the negative effects it supposes to our health.

Being overweight is the step before being obese. So overweight matters too because, when not controlled, can lead to future obesity.

If we go on and inspect the world map in 1975 vs the one from 2016, we can clearly see how the latter has more vivid colors. This isn't good because the yellower, the higher the overweight prevalence is. So the overweight prevalence has increased a lot, on a general basis, throughout the whole world.

Globalization, improvements in technology and science, and many other factors have made food abundant in most countries. Or at least, we can guess that access to food has increased in general terms. That, combined with the ever-increasing sedentary lifestyles, and the overconsumption of non-healthy food has led to this huge increase in overweight prevalence.

Oh, and as overweight prevalence has increased, so has obesity.

Top Countries

We see prevalences extremely high (~88.7%) in countries like Nauru or Palau. We haven't confirmed the data but several reports do insist on tackling the obesity problem in these countries (plus, the data should be reliable coming from WHO).

Focusing now on Nauru, "From the 1980s, Nauruans led a sedentary lifestyle with an unhealthy diet, contributing to the worst health conditions in the Pacific region".[5]

Also, "Approximately 90% of the land area of Nauru is covered with phosphate deposits, with the majority strip-mined and non-arable. This has led to Nauruan reliance on processed food, high in both sugar and fat, imported from large Oceanian countries such as Australia and New Zealand" [6]

We could argue that Nauru is an outlier, but there are several countries in similar positions. Worrying, to say the least.

Time Difference

Whether a country's prevalence has increased a lot or not may depend on several factors and all should be treated independently.

However, I'm fascinated by Singapore. They've almost remained the same in a 41-year period. Several Western-Europe countries are also at the top of least-increased. I'm sure we could extract a pattern here if we looked for one.

On the other extreme, Botswana increased its prevalence a lot when we focus on the difference between 1975 and 2016. However, in 2016, they were still far from the top. That's probably because their access to food has increased in the last few years, but Botswana is still Botswana.

Gender

In most countries, men have historically led the overweight prevalence. We see men being more overweight than women most of the time.

We could try to create different hypotheses here but one's undeniable: women have historically cared a lot more about their appearance than men, probably due to social pressure. We live in a sexist world, and these are just some of the consequences of it.

It's only recently that we're seeing countries like Brazil where the overweight prevalence in women has surpassed the men's. Not something to be happy about, but could we consider it a feminist act? Probably not.

But still, the pattern is clear if we generalize it.

Final conclusion

The hard truth is that overweight prevalence – and therefore, obesity – is increasing as time goes by. And it affects everyone: all genders, all countries. No single one is safe.

Is it really a pandemic? One could think so.

What's worrying is that, in most cases, being overweight is a consequence of bad habits, poor nutrition, a sedentary lifestyle, and bad sleep. Add to that the overall caloric intake that has been increasing for so many years and doesn't seem to plateau.

All of these depend on one's self, but most are clearly not doing anything.

It's already been proven that obesity kills people: it's linked to heart attacks, strokes, diabetes, cancer… And we still let it go up.

We must stop this, but we can only take care of ourselves. So make sure you choose well.

As Jerzy Gregorek said:

"Hard choices, easy life. Easy choices, hard life"

The End

Thanks for reading it through!

This was the longest post I've ever written but I've enjoyed it as a child would enjoy playing with toys. I hope you did too.

The aim of this post was to share some tools I use to create interactable visualizations and to finally do a sort of brief and informal story-telling, using the dashboard I created, to extract some insights and hypotheses from the data we've seen.

In the resources section, you'll see a link to check all the code used for this post![7]

Thanks for reading the post!   I really hope you enjoyed it and found it insightful.  Follow me and subscribe to my mailing list for more  content like this one, it helps a lot!  @polmarin

If you'd like to support me further, consider subscribing to Medium's Membership through the link you find below: it won't cost you any extra penny but will help me through this process.

Join Medium with my referral link – Pol Marin

Resources

[1] Prevalence of overweight among adults, BMI >= 25 (age-standardized estimate) (%) – World Health Organization

[2] seaborn: statistical data visualization

[3] Bokeh

[4] Plotly express in Python – Plotly

[5] Nishiyama, Takaaki (27 May 2012). "Nauru: An island plagued by obesity and diabetes". Asahi Shimbun.

[6]"I have seen so many funerals for such a small island": The astonishing story of Nauru, the tiny island nation with the world's highest rates of type 2 diabetes

[7] Interactive Viz Repo – GitHub

Tags: Dash Data Analysis Data Visualization Plotly Python

Comment