Predicting Population Decline with Python

Author:Murphy | View: 25133 | Time: 2025-03-22 23:04:26

Quick Success Data Science

In the movie _Avengers: Infinity War_, the supervillain Thanos snaps his fingers, and the Infinity Gauntlet randomly disintegrates half the life in the universe. He does this so that the other half might prosper, but he only got it "half" right. If he wanted to fix the "problem," he would've snapped again to ensure that the new, smaller population didn't recover over time.

Here's the thing. Thanos reduced the human population from around 8 billion to 4 billion, about what it was in 1974. But it took only 49 years to grow from 4 billion to 8 billion in 2023. That, in itself, is a finger snap on the cosmological timescale.

Annual world population since 10,000 BCE (Wikipedia (CC BY-SA 4.0 Deed))

Thanos needed a second snap to permanently set the _total fertility rate (TFR)_ equal to the fertility replacement rate (FRR).

The TFR of a population is the average number of children born to a woman over her lifetime. The FRR is the number of children each woman needs to have on average to replace the current generation and maintain a stable population.

The FRR is nominally 2.1. That's a little larger than 2.0 to account for slight asymmetry in the number of male children versus female children and accelerated deaths due to wars, famine, and pestilence. You can see a fun video about it here.

In truth, the FRR varies by population and over time, from as low as 2.06 children per woman to well over 3. Populations with high infant mortality rates or a high ratio of males to females will require a higher FRR to stay in balance.

In this Quick Success Data Science project, we'll first examine how quickly humanity could have recovered from "The Snap", and then look at how fertility rates are affecting the populations of some major countries like China and Japan.

Along the way, we'll cover some important coding subjects like:

How to loop over items in a Python dictionary.
How to create a pandas DataFrame from a dictionary.
How to plot bar charts and stack plots with Matplotlib.
How to plot a Matplotlib table from a DataFrame.
How to combine multiple plots into a montage.

The technique we'll use to project populations is unsophisticated, but that doesn't mean it's not useful. Back-of-the-envelope modeling is a great way to quickly scope out limits and boundaries, test assumptions, and generate ballpark values for high-level planning. It's especially helpful for the early identification of issues and tasks that appear to be important but are ultimately a waste of resources and time.

Repopulating the Earth after "The Snap"

We'll start by estimating how long it would take Earth's human population to recover after "The Snap." We'll assume that plant and other animal life will keep pace with human life, so there will be no food shortages to complicate things.

The Equation for Population Growth

TFR and the long-term Population Growth rate, g, are closely related. For a population structure in a steady state, the growth rate is given by the following equation:

The Xm variable is the mean age for childbearing women. This corresponds to a generation, defined as the average time between the birth of parents and the birth of their offspring.

Once the growth rate is known, it can be plugged into the next equation to predict future populations:

The years variable represents the number of years the population has been growing.

It's always good to run a "sanity check" on equations to ensure they're working as you expect. In the next section, we'll use this equation on the last doubling period between 1974 and 2023.

Not only do we know (within reason) the annual world population over the last 50 years, but we also have estimates of the annual TFR and mean childbearing age. We can use these numbers to judge how well the growth equation works. If it performs well in a backcast, we can expect it to do the same in a forecast.

Datasets

Data on TFR and childbearing ages can be found on the Our World in Data website and on [Wikipedia](https://en.wikipedia.org/wiki/World_population), respectively. For comparison, alternative data can be examined on the macrotrends and database.earth websites. Data on historical population estimates can be found on Wikipedia. The Our World in Data and Wikipedia datasets are made available under a CC BY 4.0 DEED Attribution 4.0 International license.

Testing the Equation

An assumption with our growth equation is that the TFR and mean childbearing age (Xm) remain the same over time. In reality, these will be somewhat variable, as shown [here](https://database.earth/population/mean-age-childbearing) for TFR and here for childbearing age.

To address this, I used population to weigh the TFR and Xm values using selected dates between 1974 and 2023. The results for TFR are shown below.

Population-weighted TFR values (by the author from Wikipedia (CC BY-SA 4.0 Deed))

The resulting population-weighted TFR was 3.019. A similar process for Xm yielded an average childbearing age of 27.83. These numbers were plugged into the equation as shown below:

import math

TFR = 3.019  # Population-weigthed average
MEAN_CHILDBEARING_AGE = 27.83 # Population-weigthed average
POPULATION_2023 = 8_000_000_000
POPULATION_1974 = 4_000_000_000
NUMBER_OF_YEARS = 49  # 1974-2023

population = POPULATION_1974

growth_rate = math.log(TFR / 2) / MEAN_CHILDBEARING_AGE
population *= (1 + growth_rate) ** NUMBER_OF_YEARS

print(f"Years: {NUMBER_OF_YEARS}")
print(f"Growth rate: {growth_rate:,.4f}")
print(f"Population: {population:,.0f}")

Years: 49
Growth rate: 0.0148
Population: 8,215,290,834

The result is within 3% of the actual answer. That's acceptable for the back-of-the-envelope work we're trying to do.

One thing to note here is that, if we'd used the 1974 values (TFR=4.26 and Xm=28.4), we would've predicted a 2023 population of 14.5 billion! Going forward, we won't have the luxury of hindsight. So, we'll need to incorporate boundaries – such as using a TFR value close to 2.0 – into the analysis.

Confirming the Equation with Euler's Number

We can also test the previous equation by using the calculated growth rate in an alternative equation that uses Euler's Number (e):

where r represents the growth rate and t is time in hours or years.

Here's the code:

import math

GROWTH_RATE = 0.014796
POPULATION_2023 = 8_000_000_000
POPULATION_1974 = 4_000_000_000
NUMBER_OF_YEARS = 49  # 1974-2023

population = POPULATION_1974

pred_popl_2023 = (POPULATION_1974 * 
                  math.exp(GROWTH_RATE * NUMBER_OF_YEARS))

print(f"Years: {NUMBER_OF_YEARS}")
print(f"Growth rate: {GROWTH_RATE:,.4f}")
print(f"Population: {pred_popl_2023:,.0f}")

Years: 49
Growth rate: 0.0148
Population: 8,258,957,436

Once again, we're close to the target value of 8 billion. Now that we've checked the equation using backcasting, we can proceed to forecasting.

CAUTION: Python's math module uses log() for the natural logarithm and log10() for the base 10 logarithm. This can trip you up if you're used to "ln" representing the natural log.

Predicting the Post-snap Population

The Infinity War movie came out fairly recently (2018), so we'll use (near) current values for TFR and Xm. Based on the previously referenced Our World in Data and Wikipedia websites, the 2021 global TFR was 2.3 and the 2021 average childbearing age was between about 27 and 29 years.

Here's the code:

import math

TFR = 2.3
INCREMENT_YEARS = 10 
MEAN_CHILDBEARING_AGE = 28
PRE_SNAP_POPULATION = 8_000_000_000
POST_SNAP_POPULATION = 4_000_000_000

population = POST_SNAP_POPULATION
years_from_snap = 10

growth_rate = math.log(TFR / 2) / MEAN_CHILDBEARING_AGE

# Calculate population in n-year increments after snap:
while population < PRE_SNAP_POPULATION:
    population *= (1 + growth_rate) ** INCREMENT_YEARS
    print(f"Years from snap: {years_from_snap}t 
    Population: {population:,.0f}")
    years_from_snap += INCREMENT_YEARS

And here's the output:

Years from snap: 10  Population: 4,204,204,846
Years from snap: 20  Population: 4,418,834,597
Years from snap: 30  Population: 4,644,421,456
Years from snap: 40  Population: 4,881,524,798
Years from snap: 50  Population: 5,130,732,553
Years from snap: 60  Population: 5,392,662,666
Years from snap: 70  Population: 5,667,964,628
Years from snap: 80  Population: 5,957,321,089
Years from snap: 90  Population: 6,261,449,548
Years from snap: 100  Population: 6,581,104,134
Years from snap: 110  Population: 6,917,077,473
Years from snap: 120  Population: 7,270,202,658
Years from snap: 130  Population: 7,641,355,311
Years from snap: 140  Population: 8,031,455,757

Wow. The Earth could potentially restore its population in about 140 years, or two human lifetimes. I doubt that's what Thanos had in mind.

But what if Thanos had instructed the Infinity Gauntlet to reduce TFR after halving the population? Here's the output using a TFR of 2.0001 and an Xm of 33:

Years from snap: 457,490  Population: 8,000,013,060

Now that made a difference. Assuming no major wars or other calamities, it would take humanity over 450,000 years to reach a population of 8 billion.

NOTE: Using TFR = FRR = 2.1 with our simple growth equation will result in a growing – rather than stable – population. This is because our equation does not factor in the mortality issues (such as war, famine, and disease) that the value of 2.1 is designed to address.

To my knowledge, there's no evidence that Thanos tampered with the TFR in the Infinity War saga. That means Earth could be repopulated in a cosmic heartbeat. Thanos wasted his snap.

Projecting Population Changes with the TFR

Today, the TFR in many nations has fallen below 2.0. This means that, barring migration, their populations are both falling and growing older.

Let's project what this looks like over the next 40-odd years and summarize it using a combination of Matplotlib chart types. We'll restrict our analysis to China, Russia, the USA, and Japan to keep things simple.

We'll use TFR data from the Our World in Data website and childbearing ages and population data from Wikipedia. These datasets are all licensed under a CC BY 4.0 DEED Attribution 4.0 International license.

Installing Third-party Libraries

You'll need Matplotlib for plotting and pandas for manipulating the data. You can find installation instructions for your system in the previous hyperlinks.

Populating the DataFrame

The following code is annotated, but one thing worth highlighting here is that we use the items() method to access dictionary keys and values when looping over the TFR dictionary.

import math
import matplotlib.pyplot as plt
import pandas as pd 

START_YEAR = 2022
years = [0, 8, 18, 28, 38]
year = [START_YEAR + year for year in years]

# 2022 Total Fertility Rate by country:
TFR = {'China':  1.66, 
       'USA':    1.89,
       'Russia': 1.79,
       'Japan':  1.53}

# 2021 Mean Childbearing Age by country: 
Xm = {'China':  28.8,  # Probably closer to 29 in 2022
      'USA':    29.6,
      'Russia': 28.7,
      'Japan':  31.4}

# 2022 population in millions by country:
popl_millions = {'China': [1426],
                 'USA':    [338],
                 'Russia': [147],
                 'Japan':  [124]}

# Calculate future populations and create DataFrame:
for country, rate in TFR.items():
    start_popl = popl_millions[country][0]
    age = Xm[country]
    for yr in years[1:]:  # Start from the second year
        growth_rate = math.log(rate / 2) / age
        popl_new = start_popl * (1 + growth_rate) ** yr
        popl_millions[country].append(round(popl_new))

# Create a DataFrame from the population dictionary:
df = pd.DataFrame(popl_millions)

# Add a column for years:
df.insert(loc=0, column='Year', value=year)

# Display the full DataFrame:
display(df)

Here's the whole DataFrame:

Making the Matplotlib Montage

To present the results, we'll use a multipaneled plot with one row and three columns. We'll show the total fertility rates in a bar chart, the DataFrame in a matplotlib table, and the declining population using a stack plot. Stack plots are generated by plotting different datasets vertically on top of one another rather than overlapping with one another.

Here's the code. We'll cover some of the programming details at the end.

# Build a multi-panel figure with 3 plots in a single row:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(14, 6))
fig.suptitle('Predicted Post-2022 Population Changes in Millions*', 
             fontsize=25)

# Create a Fertility Rate bar chart:
ax1.bar(TFR.keys(), 
        TFR.values(), 
        color=['tab:blue', 'tab:orange', 
               'tab:green', 'tab:red'])
ax1.set_title('Fertility Rate')

# Create a Population table:
bbox = [0, 0, 1, 1]
ax2.axis('off')
ax2.set_title('Projected Population Table')
mpl_table = ax2.table(cellText=df.values, 
                      bbox=bbox, 
                      colLabels=df.columns, 
                      colColours=['white', 'tab:blue', 'tab:orange', 
                                  'tab:green', 'tab:red'])
mpl_table.auto_set_font_size(True)
mpl_table.set_fontsize(12)

# Create a stack plot of Population Decline:
ax3.stackplot(df['Year'], 
              [df['China'], df['USA'], 
               df['Russia'], df['Japan']])
ax3.set_title('Modeled Population Decline')
ax3.set_xlabel('Year', fontsize=14)
ax3.set_ylabel('Population in Millions', fontsize=14)
ax3.yaxis.tick_right()
ax3.grid()
ax3.legend(labels=df.columns[1:5], loc='lower left')
ax3.text(2022, -295, '*Based solely on birthrates; no migration effects')
ax3.margins(0, 0)  # Set margins to avoid whitespace in the graph.

# Add a border to the montage:
fig.patch.set_linewidth(3)
fig.patch.set_edgecolor('k')

#plt.show()

Here's the result, as viewed in JupyterLab:

In the previous code, we used Matplotlib's plt.subplots() method to build the multipaneled display by passing it the number of rows (1), followed by the number of columns (3). The three plots were named, by convention, ax1, ax2, and ax3 (see Demystifying Matplotlib for details).

Demystifying Matplotlib

For the Matplotlib table, we first used bbox = [0, 0, 1, 1] to define a bounding box. The values represent the left, bottom, right, and top coordinates. Because these range from 0 to 1, this spans the entire ax2 subplot.

To make the table itself, we called Matplotlib's table() method on ax2 and passed it the DataFrame population values, the bounding box to use, the column labels (taken from the DataFrame), and an optional list of colors for the labels. We assigned the output to the mpl_table variable.

Next, we used the auto_set_font_size() method to automatically adjust the text font to fit the table cells. This isn't strictly necessary based on the font size we're using (12). Nevertheless, it's a handy method to be aware of for ensuring your table content fits inside the bounding box.

We ended with the stack plot, where multiple variables are stacked on top of each other to visualize their cumulative effect over a specified period. Each variable is represented by a filled area, and the total height of the stack at any given point represents the sum of the individual contributions.

To make the plot, we used Matplotlib's stackplot() method and passed it the DataFrame's Year column for the x-axis and a list of column names (countries) from which to source the y-axis.

The Recap

The population projection approach we've used here is simplistic and doesn't take into account things like increasing life expectancies and changing immigration patterns. Nevertheless, the results are directionally similar to more sophisticated analyses that you can find online.

For example, the UN's minimum case estimate for China's population in 2050 is around 1.2 billion, whereas we predicted 1.189 billion. For the same year, the Population Pyramid's projection for Japan is 103.8 million, close to our 98 million. Their Russian prediction is 133 million, while we predicted 132 million. With zero net migration, the Center for Immigration Studies estimates a US population of 336 million, not far off our 320 million.

There are at least two things we can glean from this analysis. Firstly, Thanos didn't think things through. Secondly, population projections for developed countries look pretty dire.

Japan, in particular, is in bad shape. In a speech last year, Prime Minister Kishida Fumio stressed that the birth rate had fallen "to the brink of not being able to maintain a functioning society." The cause of this is something of a mystery, though a lack of jobs for young men may be a culprit.

China is in a similar position. This is in part the result of its _one-child policy_, high living and education costs, and skyrocketing property prices. City people, in particular, face stagnating wages, fewer job opportunities, and grueling work hours that make it both difficult and expensive to raise multiple children.

In Russia, in the summer of 2022, President Vladimir Putin reinstated a Soviet-era award, giving women who have ten or more children a single payment of a million rubles. The honorary title and certification of "Mother Heroine" are given to the mother once their tenth living child turns 1 year old.

Other countries make up for declining birth rates through increased immigration. The United States, for example, is expected to maintain or slightly increase its current population, despite falling birth rates.

Finally, I hope you gained an appreciation for how even simple modeling techniques can lead to important insights. Python, along with third-party tools like pandas and Matplotlib, makes it easy to analyze data and produce compelling visualizations.