Maximize Your Insights by Choosing the Best Chart: Network, Heatmap, or Sankey?

Author:Murphy  |  View: 24377  |  Time: 2025-03-23 13:04:31

Visualization is an important part of data analysis as it can transform data into insights and help you with storytelling. In this blog post, I will focus on Network charts, Heatmaps, and Sankey charts. These charts have the same input, but we should keep in mind that they are designed with a specific goal, and the interpretability can therefore differ. I will describe the differences between Network, Heatmap, and the Sankey chart, the applications, and I will demonstrate their interpretability with a hands-on example. All examples are created in Python using the D3Blocks library.


The Input for the Heatmaps and Sankey charts.

As a data scientist, a common but essential task is making plots. Sometimes these plots serve as sanity checks and sometimes they end up in presentations and form the fundamentals of the story. Especially for the latter case, we aim to transform complex information into logical graphical visualizations.

Creating plots is like photography. You want to capture the scenery that tells the story.

However, deciding which chart to use is not always an easy task because, although charts can have similar input, they are designed to describe a specific part of the scenery. The input for the three charts requires source, target, and weight information. A small example is shown below. It describes how the variables (or nodes) are connected and the strength of it. Or in other words, Penny is connected with Leonard with strength 5. The second node name is again Penny who is also connected with Amy but the strength is slightly less with value 3 and so on.

# Source node names
source = ['Penny', 'Penny', 'Amy', 'Bernadette', 'Bernadette', 'Sheldon', 'Sheldon', 'Sheldon', 'Rajesh']
# Target node names
target = ['Leonard', 'Amy', 'Bernadette', 'Rajesh', 'Howard', 'Howard', 'Leonard', 'Amy', 'Penny']
# Edge Weights
weight = [5, 3, 2, 2, 5, 2, 3, 5, 2]

The nodes are thus the joint set of names from source and target. The edges are the relationship between source and target. The relationship between the source and target values can be handled directed or undirected by the chart. The weight values describe the strength of the relationship. Notably, the values of source-target-weight can also be in the form of a (sparse) adjacency matrix for which the columns and indexes are the nodes and the elements with a value of > 0 are considered to be an edge. This form is often used for the creation of heat maps but it essentially contains the same information. In the next section, I will describe how this information is translated into the charts.

# Install d3blocks for the following examples
pip install d3blocks

# Install cluster evalation (required for the heatmaps)
pip install clusteval
# Import
from d3blocks import D3Blocks
# Initialize
d3 = D3Blocks()
# Convert
adjmat = d3.vec2adjmat(source, target, weight)
# Print
print(adjmat)

# target      Amy  Bernadette  Howard  Leonard  Penny  Rajesh  Sheldon
# source
# Amy         0.0         2.0     0.0      0.0    0.0     0.0      0.0
# Bernadette  0.0         0.0     5.0      0.0    0.0     2.0      0.0
# Howard      0.0         0.0     0.0      0.0    0.0     0.0      0.0
# Leonard     0.0         0.0     0.0      0.0    0.0     0.0      0.0
# Penny       3.0         0.0     0.0      5.0    0.0     0.0      0.0
# Rajesh      0.0         0.0     0.0      0.0    2.0     0.0      0.0
# Sheldon     5.0         0.0     2.0      3.0    0.0     0.0      0.0

Charts Translate Data Differently.

The Network, Sankey, and Heatmap charts have their own properties and can therefore present the same data in a different manner. A brief summary is as follows:

  • Network charts visualize relationships between entities, where nodes represent entities and edges represent relationships between them. Advantages: This type of chart is useful for understanding complex behavior where you also need to know (some) of the exact relationships between the entities. A disadvantage is that the chart becomes cluttered and difficult to read with large datasets. However with the use of different layouts or by breaking the network on the weights it can become effective again. For more details on how to use the interactive functionalities, read the following blog [1]:

Creating beautiful stand-alone interactive D3 charts with Python

  • Heatmaps are effective to visualize the strength or magnitude of the relationship between variables, where values are represented by (shades of) colors. Advantage: This type of chart is useful for identifying patterns and trends in large datasets with multiple variables. Where networks can become a hairball, heatmaps can provide structured insights. Disadvantage: You easily lose track of individual relationships. However, when you provide clear labels, and cluster the rows and/or columns, the relationships between the variables can be easier to interpret.
  • Sankey charts can visualize the flow of data or resources between entities, where nodes represent different stages or entities and links represent the flow of data or resources between them. Advantage: Useful for understanding complex processes or systems and identifying areas for optimization or improvement. A disadvantage is that it can become difficult to read with too many stages or entities. For more details, read the following blog [2]:

Hands-on Guide to Create Beautiful Sankey Charts in d3js with Python.


The Applications of Network, Heatmap, and Sankey Charts

The Network, Heatmap, and Sankey chart can be created with the D3Blocks library. More details about D3blocks can be read over here [3]:

D3Blocks: The Python Library to Create Interactive and Standalone D3js Charts.

The applications for the Network, Heatmap, and Sankey chart differ. Network charts are often used to visualize social media networks, such as Twitter posts or Facebook, where nodes represent users and edges represent relationships between them. Heatmaps are used in many applications where the number of data points can be large. Examples are stock prices, gene expression data, and climate data among others. Sankey charts are used to visualize the flow, such as for Customer journey data with different stages of the customer journey (e.g., website visit, sign-up, purchase). Another example is Energy flow or supply chain flow with different sources and uses of energy or different stages of the supply chain (e.g., raw materials, manufacturing, distribution).


A Hands-on Comparison between Network, Heatmap, and Sankey Charts

Let's load the Energy data set [4] and compare the interpretability of the three charts. The Energy data set contains 48 nodes and 68 weighted (undirected) relationships for which we can visualize the flow of energy. What you can observe is that the network graph makes it easy to understand the exact relationships between the characters. The heatmap on the other hand does show a global view of all the relationships, whereas the Sankey chart shows the flow between the characters. As an example, in this data set, John seems to be an important character and has a central point in the network chart and many flows going from and to it. You can reproduce the results using the following code blocks:

# ######################
# Create network graph #
# ######################

# Load library
from d3blocks import D3Blocks
# Initialize
d3 = D3Blocks()
# Load energy data sets
df = d3.import_example(data='energy')

# Create the network graph
d3.d3graph(df, cmap='Set2')
# Extract the node colors from the network graph.
node_colors = d3.D3graph.node_properties
D3graph created in D3Blocks. The interactive HTML version can be on my Github pages.

The clustering of the heatmap is created using the clusteval library [5]. This library determines the most optimal cluster cut-off using a cluster evaluation metric such as Silhouette score, DBindex, or DBscan. The defaults can be changed as shown in the code section below. The data is z-score normalized.

# ################
# Create Heatmap #
# ################

# Initialize
d3 = D3Blocks()
# Load Energy data sets
df = d3.import_example(data='energy')

# Create the default heatmap but do hide it. We will first adjust the colors based on the network colors.
d3.heatmap(df, showfig=False)

# Update the colors of the network graph to be consistent with the colors
for i, label in enumerate(d3.node_properties['label']):
    if node_colors.get(label) is not None:
        d3.node_properties['color'].iloc[i] = node_colors.get(label)['color']

# The colors in the dataframe are used in the chart.
print(d3.node_properties)

# Make the chart
d3.show(showfig=True, figsize=[600, 600], fontsize=8, scaler='zscore')

# You can make adjustments in the clustering:
d3.heatmap(df, cluster_params={'evaluate':'dbindex',
                               'metric':'hamming',
                               'linkage':'complete',
                               'normalize': False,
                               'min_clust': 3,
                               'max_clust': 15})
Heatmap created in D3Blocks. The interactive HTML version can be on my Github pages.
# ###############
# Create Sankey #
# ###############
# Initialize
d3 = D3Blocks()

# Create sankey graph
d3.sankey(df, showfig=True)
Sankey chart created in D3Blocks. The interactive HTML version can be on my Github pages. Supplies are on the left, and demands are on the right. Links show how varying amounts of energy are converted or transmitted before being consumed or lost.

We can also adjust the colors of the nodes to match it with the other charts.

# Initialize
d3 = D3Blocks(chart='Sankey', frame=True)
# Load data set
df = d3.import_example(data='energy')

# Set default node properties
d3.set_node_properties(df)

# Update the colors of the network graph to be consistent with the colors
for i, label in enumerate(d3.node_properties['label']):
    if node_colors.get(label) is not None:
        d3.node_properties['color'].iloc[i] = node_colors.get(label)['color']

# The colors in the dataframe are used in the chart.
print(d3.node_properties)
#   id                               label    color
#    0                  Agricultural_waste  #66c2a5
#    1                      Bio-conversion  #66c2a5
#    2                              Liquid  #e5c494
#    3                              Losses  #e78ac3
#    4                               Solid  #66c2a5
#    5                                 Gas  #fc8d62
#    ...

# Create edge properties
d3.set_edge_properties(df, color='target', opacity='target')
# Show the chart
d3.show()
Sankey chart created in D3Blocks. Node colors match the other two charts and the edge colors are automatically set.

Improve The Interpretability Through Interactive Charts

The use of interactive charts can help to enhance interpretations and/or highlight areas of interest. One manner is the pan and zoom functionality which is also demonstrated in the d3graph chart. Another way to gain more insights is by breaking the network on the strength of the edges using the slider that is automatically created. This allows us to understand the relationships between the nodes more quickly.


Build your story using a stacked approach

You may have noticed from the previous sections that there may not be one best chart for all your use cases. It is often beneficial to use a stacked approach that describes the data from different angles and/or depths. As an example, you can start using a Heatmap chart to demonstrate the overall weak and strong relationships. Then, select a cluster or region of interest and use a network chart to analyze the exact relationships in more depth. Finally, you can use a Sankey chart if you now need to describe the flow between the nodes and the dependencies.


Summary

In summary, choosing the correct visualization technique is essential to effectively gain insights into the data set. The choice of chart depends on, among others, the type of data set, and the research question. In this blog, we compared 3 popular visualization charts: Networks, Heatmaps, and Sankey charts using hands-on examples. It's important to note that creating charts is an important part of the analysis. If you give extra attention to the interpretation, the storyline can more effectively be communicated to the audience.

Be Safe. Stay Frosty.

Cheers E.


If you find this article helpful, you are welcome to follow me because I write more about Visualization techniques. If you are thinking of taking a Medium membership, you can support my work a bit by using my referral link. It is the same price as a coffee but allows you to read unlimited articles monthly!


Let's connect!


References

  1. E. Taskesen, Creating beautiful stand-alone interactive D3 charts with Python, Medium (Towards Data Science), Feb. 2022
  2. E. Taskesen, Hands-on Guide to Create beautiful Sankey Charts in d3js with Python, Medium (Towards Data Science), Oct. 2022
  3. E. Taskesen, D3Blocks: The Python Library to Create Interactive and Standalone D3js Charts, Medium (Towards Data Science), Sept. 2022
  4. Department of Energy & Climate Change, Tom Counsell (Open Government Licence v3.0)
  5. E. Taskesen, From Data to Clusters: When is Your Clustering Good Enough? Medium, Apr. 2023

Tags: D3js Data Visualization Python Sankey Diagram Tips And Tricks

Comment