How to Create Beautiful Age Distribution Graphs With Seaborn and Matplotlib (Including Animation)

Author:Murphy | View: 29223 | Time: 2025-03-23 18:17:09

Graph Tutorial

Today, I want to show you how to create beautiful age-distribution graphs like the ones above using Matplotlib and seaborn.

Age distribution graphs are excellent for visualizing the demographic of a country or region. They are fascinating, but the default Seaborn + Matplotlib graphs do not look good enough for us.

Here's what you'll learn in this tutorial:

How to create a Seaborn style
Improving the axes to make them readable and informative
Adding a title and a beautiful legend
Turning Matplotlib figures into PIL images and adding outside padding
Creating grids of multiple images (like the example above)
Creating time-lapse animation to show how a population change over time

You can find the data and my code in this GitHub repository if you want to follow along.

Let's get started.

A quick walkthrough of the data

The original data comes from the Population Estimates and Projections dataset, which is a dataset from the World Bank licensed under Creative Commons Attribution 4.0. It contains actual values between 1960–2021 and official predictions up until 2050.

In the GitHub repository, I've processed the data and created four separate CSV files so that you can focus on making graphs.

Two files, one for females and one for males, have the population in absolute numbers.

The other two have values that describe the ratio of the total population. In the screenshot below, for example, you can see that only 0.336% of the people in Bahrain were between 70–74 years old in 1960.

The dataset has 17 age groups, 00-04, 05-09, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, and 80+.

And over 250 countries and regions, so feel free to create age distribution graphs that interest you.

Creating a first age distribution chart

Now that we understand the data, we can create a simple graph with default settings for Seaborn. I'm using a red color for females and blue for males.

It's perhaps a bit stereotypical, but making your graphs easy to understand is vital, and the colors are essential for that first interpretation.

The only "trick" is that I multiply the values for males by minus one so that the blue bars go in the opposite direction.

Here's the function to create the graph.

Python">def create_age_distribution(female_df, male_df, country, year):
    df_f = female_df[female_df.country_name == country].loc[::-1]
    df_m = male_df[male_df.country_name == country].loc[::-1]

    ax = sns.barplot(y=df_m["indicator_name"], x=df_m[year] * -1, orient="h", color=MALE_COLOR)
    ax = sns.barplot(y=df_f["indicator_name"], x=df_f[year], orient="h", color=FEMALE_COLOR)

    return ax

And here's how I use it.

fig = plt.figure(figsize=(10, 7))

ax = create_age_distribution(
    female_df=population_female,
    male_df=population_male,
    country="World",
    year="2021"
)

plt.show()

This is the resulting age distribution graph for the World in 2021. It shows all the data, but it doesn't look great and is difficult to understand.

Let's make it better.

Creating a Seaborn style

The best part about Seaborn is that it's easy to create your unique styles using sns.set_style(). It takes a dictionary that can have several different values.

For this tutorial, I've created the following function to try different styles quickly.

def set_seaborn_style(font_family, background_color, grid_color, text_color):
    sns.set_style({
        "axes.facecolor": background_color,
        "figure.facecolor": background_color,

        "axes.labelcolor": text_color,

        "axes.edgecolor": grid_color,
        "axes.grid": True,
        "axes.axisbelow": True,

        "grid.color": grid_color,

        "font.family": font_family,
        "text.color": text_color,
        "xtick.color": text_color,
        "ytick.color": text_color,

        "xtick.bottom": False,
        "xtick.top": False,
        "ytick.left": False,
        "ytick.right": False,

        "axes.spines.left": False,
        "axes.spines.bottom": True,
        "axes.spines.right": False,
        "axes.spines.top": False,
    }
)

You might want to have something that gives you even more control. I've left out a few options I don't care about here, and I reuse the same colors in multiple places.

We must select the background, grid, and text colors to run the function. I prefer charts with a background color because they stand out more from the page. A white background can look good, but it's not my style.

When creating a new color scheme, I often start by finding one color I like. A good place to start looking is Canva Color Palettes or ColorHunt.

After I've found a few colors I like, I generate additional colors with Coolors.

Here's the main color palette I'm using in this tutorial.

Now I can run set_seaborn_style() with our new colors, and I've selected PT Mono as the font.

FEMALE_COLOR = "#F64740"
MALE_COLOR = "#05B2DC"

set_seaborn_style(
    font_family="PT Mono",
    background_color="#253D5B",
    grid_color="#355882",
    text_color="#EEEEEE"
)

Here's what the chart looks like now.

It's a clear improvement from what we had before, but it lacks information and is still difficult to understand.

Let's continue by fixing the axes.

Improving the axes

Now that the colors look good, it's time to make the chart more informative.

Here are three things I want to do.

Remove the axis labels because they don't add information
Format the values on the x-axis to make them more informative
Make the text bigger so the graph looks good on smaller screens

The solution consists of two functions.

First, the create_x_labels() function deals with the second bullet point and allows me to adapt the x-axis based on a country's population quickly or if I want to use ratios instead of absolute numbers.

def create_x_labels(ax, xformat):
    if xformat == "billions":
        return ["{}B".format(round(abs(x / 1e9))) for x in ax.get_xticks()[1:-1]]
    elif xformat == "millions":
        return ["{}M".format(round(abs(x / 1e6))) for x in ax.get_xticks()[1:-1]]
    elif xformat == "thousands":
        return ["{}K".format(round(abs(x / 1e3))) for x in ax.get_xticks()[1:-1]]
    elif xformat == "percentage":
        return ["{}%".format(round(abs(x), 1)) for x in ax.get_xticks()[1:-1]]

And second, the format_ticks() function, which takes care of the first and third bullet points and calls create_x_labels().

def format_ticks(ax, xformat, xlim=(None, None)):
    ax.tick_params(axis="x", labelsize=12, pad=8)
    ax.tick_params(axis="y", labelsize=12)
    ax.set(ylabel=None, xlabel=None, xlim=xlim)

    plt.xticks(
        ticks=ax.get_xticks()[1:-1],
        labels=create_x_labels(ax, xformat)
    )

The xlim parameter is essential if we want to compare two different age distributions. If we leave it empty, the axis adapts to the values in the data, and the bars will stretch over the entire axes.

I add the functions when I create the chart. It's exactly like before, but with format_tricks() at the end.

fig = plt.figure(figsize=(10, 7))

ax = create_age_distribution(
    female_df=population_female,
    male_df=population_male,
    country="World",
    year="2021"
)

# New functions
format_ticks(ax, xformat="millions")

plt.show()

Here's what the new graph looks like.

We can also test the percentage format by setting xformat="percentage" and using population_ratio_male and population_ratio_female. I also set xlim=(-10, 10).

It looks good, but we can do even more.

Adding a title and a legend

Two obvious improvements that I want to fix now are:

Add a title that describes the graph
Add a legend that explains what the bars represent

To create the legend, I wrote the following function that takes x and y parameters to define the location.

def add_legend(x, y): 
    patches = [
        Patch(color=MALE_COLOR, label="Male"),
        Patch(color=FEMALE_COLOR, label="Female")
    ]

    leg = plt.legend(
        handles=patches,
        bbox_to_anchor=(x, y), loc='center',
        ncol=2, fontsize=15,
        handlelength=1, handleheight=0.4,
        edgecolor=background_color
    )

Then, I add this function just like I did with format_tricks() in the previous step.

fig = plt.figure(figsize=(10, 8))

ax = create_age_distribution(
    female_df=population_female,
    male_df=population_male,
    country="World",
    year="2021"
)

# New functions
format_ticks(ax, xformat="millions")
add_legend(x=0.5, y=1.09)
plt.title("Age Distribution for the World in 2021", y=1.14, fontsize=20)

plt.tight_layout()
plt.show()

I've also added plt.title() to add a title.

When I run everything, the age distribution graph looks like this.

It looks fantastic. Let's move on.

Creating a PIL image and adding padding

At some point, I want to turn my figures into images I can save to disc and customize in other ways.

One such customization is to add some padding around the graph to make it look less cramped.

First, I've created the create_image_from_figure() function that turns a Matplotlib figure into a PIL image.

def create_image_from_figure(fig):
    plt.tight_layout()

    fig.canvas.draw()
    data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
    data = data.reshape((fig.canvas.get_width_height()[::-1]) + (3,))
    plt.close() 

    return Image.fromarray(data)

And here's a function to add padding.

def add_padding_to_chart(chart, left, top, right, bottom, background):
    size = chart.size
    image = Image.new("RGB", (size[0] + left + right, size[1] + top + bottom), background)
    image.paste(chart, (left, top))
    return image

Again, I add these functions to the original code that creates the graph. It now looks like this.

fig = plt.figure(figsize=(10, 8))

ax = create_age_distribution(
    female_df=population_female,
    male_df=population_male,
    country="World",
    year="2021"
)

# New functions
format_ticks(ax, xformat="millions")
add_legend(x=0.5, y=1.09)
plt.title("Age Distribution for the World in 2021", y=1.14, fontsize=20)

image = create_image_from_figure(fig)
image = add_padding_to_chart(image, 20, 20, 20, 5, background_color)

And here's the resulting graph.

In my eye, this looks close to perfect. I have two more things that I want to show, how to create grids and time-lapse visualization.

Let's start with the first.

Creating a grid with multiple countries

You can use plt.subplots() to create grids, but in this tutorial, I want to create a grid of images because I think it looks better.

The following function takes a list of images and creates a grid with ncols. It works by creating an empty image with a single background color that's large enough to fit all figures.

def create_grid(figures, pad, ncols):
    nrows = int(len(figures) / ncols)
    size = figures[0].size

    image = Image.new(
        "RGBA",
        (ncols * size[0] + (ncols - 1) * pad, nrows * size[1] + (nrows - 1) * pad),
        "#ffffff00"
    )

    for i, figure in enumerate(figures):
        col, row = i % ncols, i // ncols
        image.paste(figure, (col * (size[0] + pad), row * (size[1] + pad)))

    return image

In the following code, I iterate over a list of countries, add the resulting graph to figures, and create a grid by running create_grid() at the end.

figures = []

for country in [
    "United States", "China", "Japan", "Brazil", "Canada",
    "Germany", "Pakistan", "Russian Federation", "Nigeria", 
    "Sweden", "Cambodia", "Saudi Arabia", "Iceland",
    "Spain", "South Africa", "Morocco"
]:

    fig = plt.figure(figsize=(10, 8))

    ax = create_age_distribution(
        female_df=population_ratio_female,
        male_df=population_ratio_male,
        country=country,
        year="2021"
    )

    ax.set(xlim=(-10, 10))

    # New functions
    format_ticks(ax, xformat="percentage")
    add_legend(x=0.5, y=1.09)
    plt.title("Age Distribution for {} in 2021".format(country), y=1.14, fontsize=20)

    image = create_image_from_figure(fig)
    image = add_padding_to_chart(image, 20, 20, 20, 5, background_color)

    figures.append(image)

grid = create_grid(figures, pad=20, ncols=4)

Note that I use ratios instead of absolute numbers and set xlim=(-10, 10). Otherwise, I won't be able to compare the countries to each other visually.

Let's move on to the last part of this tutorial – How to create time-lapse visualizations.

Creating a time-lapse visualization

The static age distribution charts look great, but it's fascinating to see how they change over time.

Since we have actual values from 1960 to 2021 and predictions to 2050, we can create a time-lapse animation for a relatively long period.

Before we begin, I need to tell you that the font I use, PT Mono, doesn't have the same height for all characters. To make the visualization look good, I needed to use plt.text() for the year instead of plt.title(). If you use another font, it's not necessary to do so.

Here's the code:

images = []
years = list(population_male.columns[4:])

for year in years:
    fig = plt.figure(figsize=(10, 8))

    ax = create_age_distribution(
        female_df=population_female,
        male_df=population_male,
        country="World",
        year=year
    )

    # New functions
    format_ticks(ax, xformat="millions", xlim=(-400000000, 400000000))
    add_legend(x=0.5, y=1.09)
    plt.title("Age Distribution for the World in      ", y=1.14, fontsize=21)
    plt.text(x=0.77, y=1.15, s=str(year), fontsize=21, transform=ax.transAxes)

    image = create_image_from_figure(fig)
    image = add_padding_to_chart(image, 20, 20, 20, 5, background_color)
    images.append(image)

I use imageio to create a GIF from the list of images.

# Duplicating the last frames to add a delay 
# before the animation restarts
images = images + [images[-1] for _ in range(20)]
imageio.mimwrite('./time-lapse.gif', images, duration=0.2)

Let's take a look at the result.

Awesome! That's all for this tutorial; let me know if you liked it and learned something useful.

Conclusion

This was a fun tutorial to write, and I hope you enjoyed it.

Age distributions are a great visualization of a country's demographic, and now you've seen a few ways to make them stand out.

We've learned to create styles, grids, and animations. Writing functions like I've done here is also great if you want to test different ideas and styles quickly.

I hope you learned something that you will use in the future.

Thank you for taking the time to read my tutorial. Let me know if you enjoy this type of content.

I can create more tutorials if people want them!

Tags: Data Science Data Visualization Matplotlib Python Seaborn