Comparing Country Sizes with GeoPandas
Quick Success Data Science

I live in Texas (Yee-hah!), and we love comparing Texas to other states, countries, islands, asteroids, you name it. We'll even compare Texas to Texas:

I love memes that compare the size of one geographical region to another. Despite possessing a good knowledge of geography, I'm often surprised. For example, is Australia bigger than Brazil? Is New Zealand smaller than Italy? How does Texas stack up to Germany, India, and Chile? Here are the answers:





Besides being entertaining, these comparisons make great learning tools for students.
In this Quick Success Data Science project, we'll write Python code for easily overlaying and comparing maps of various countries or US states. This is a fun way to learn about geospatial datasets and how to manipulate them with the GeoPandas library.
The Process
Here's the general process we'll use:
- Load shapefiles (boundary polygons) for each country or state.
- Project the boundaries using a coordinate reference system.
- Shift the centroid of one boundary to match the other.
- Shift and rotate one boundary until it fits well in the other.
- Plot the boundaries.
The GeoPandas Library
GeoPandas is an open-source, third-party library designed to support geospatial mapping in Python. It's built on the pandas data analysis library and makes working with geospatial vector data similar to working with tabular data. It also enables operations in Python that would otherwise require a dedicated geospatial database, such as Post GIS.
Geopandas' GeoDataFrame is a pandas DataFrame with a special "geometry" column for location data. This column bundles together the type of geometric object (such as a point, line string, polygon, etc.) and the coordinates (longitude and latitude) needed to draw it.

You can find installation instructions for GeoPandas here. As always, conda users have it easy (conda install geopandas
). If you're using pip, you'll also need to install the main dependencies of shapely, pyproj, fiona, pyogrio, and rtree, as specified in the instructions.
We'll also need Matplotlib for plotting. As an optional dependency of GeoPandas, it should be automatically installed when taking the conda route.
Shapefiles
For country and state boundaries, we'll rely on shapefiles. A shapefile is a Geospatial vector data format for geographic information system (GIS) software. It's used to spatially describe vector features like lines, points, and polygons. Items in the file include descriptive attributes, such as a name.

A "Shapefile" isn't a single file but a collection of files in a folder. The figure below shows an example shapefile for the outlines of major lakes. Notice how the filenames in this folder use the folder name as a prefix.

Three files are mandatory and have filename extensions of .shp
, .shx
, and .dbf
. While the actual shapefile relates specifically to the .shp
file, the other supporting files are required for capturing the shape geometry. For more details on these files, check out the Wikipedia page on shapefiles and this article:
We'll need to provide GeoPandas with the directory path to the shapefiles folder (zipped or unzipped). GeoPandas will use the files in this folder to automatically produce the geometry
column in the GeoDataFrame. What could be easier?
The Country Comparison Code
The following code enables comparing the maps of two countries by overlaying them at approximately the same scale. Later in the article, we'll look at comparing US states.
Importing Libraries
GeoPandas, like pandas, is built on Matplotlib, so we'll import Matplotlib for plotting.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import geopandas as gpd
Assigning Constants
For convenience, we'll assign constants at the program's top, where they'll be easy to find and change. These include the country name, a rotation angle (to better "fit" one country into another), and a coordinate reference system (CRS) for projecting from a 3D globe to a 2D map.
# Assign constants:
COUNTRY1 = 'China' # Plots in the background (back).
COUNTRY2 = 'Australia' # Plots in the foreground (front).
ROTATE = -17 # Direction and amount to rotate COUNTRY2
CRS_1 = 'EPSG:3415' # Coordinate Reference System for COUNTRY1
CRS_2 = 'EPSG:3112' # Coordinate Reference System for COUNTRY2
The CRS is important for accurately comparing the area of one country to another. You're probably familiar with standard _Mercator_ map projections and how they distort the size of countries near the poles:

Greenland rivals South America in size and Antarctica is simply outrageous. This is the result of north-south lines of longitude converging at the poles. Because these lines are non-parallel, flattening them onto a 2D map causes increasing distortion away from the equator:

This distortion can seriously screw up size comparisons. Here's what happens when you compare Australia to a Mercator projection of Canada versus a Lambert projection that better preserves area:

Fortunately, there are many ways to project from a sphere onto a plane, and some preserve area. Unfortunately, that comes with a price.
Map projections are like squeezing a balloon. If you get one parameter, like area, in hand, another parameter, like angles or shape, bulge out between your fingers.
While there are "equal area" projections, such as the _Lambert Azimuthal Equal-Area Projection or the Albers Equal-Area Conic Projection_, that preserve the relative sizes of areas on a map, they distort country outlines. If people notice that the shapes of countries are unfamiliar, they'll doubt the veracity of the display, and we can't have that!
The best solution I've found is to project each country with its recommended local projection system and then move and plot one on top of the other. Local projections are best for locally preserving area and shape. The downside is that there are many local projections to deal with. Nevertheless, this is more satisfactory than using a single projection for both countries, especially if they lie within different latitude ranges.
We'll select projections using an EPSG code. According to Wikipedia, "the EPSG Geodetic Parameter Dataset (also EPSG registry) is a public registry of geodetic datums, spatial reference systems, Earth ellipsoids, coordinate transformations and related units of measurement, originated by a member of the European Petroleum Survey Group (EPSG) in 1985."
To find the EPSG code for a country, search online using queries such as, "What is the recommended EPGS code for Australia?". Here's a list of the codes I used in this project:

If you want to use a long list of codes, consider entering them as a dictionary with the country name as the key and the code as the value.
Selecting and Projecting Shapefiles
Geopandas used to ship with the Natural Earth public domain dataset, which includes shapefiles for country and state boundaries. This was deprecated in early 2024, so we've got to download the file manually.
Fortunately, this is easy. Just navigate to this site and click the download link highlighted in yellow below:

Move the zipped file into the folder containing your Python script or notebook, then run the following code:
# Download the shapefile used below from:
# https://www.naturalearthdata.com/downloads/110m-cultural-vectors/
world = gpd.read_file('ne_110m_admin_0_countries.zip')
# Select countries from GeoDataFrames:
front = world[world['SOVEREIGNT'] == COUNTRY2]
back = world[world['SOVEREIGNT'] == COUNTRY1]
# Project to a CRS before calculating the centroid:
front = front.to_crs(CRS_2)
back = back.to_crs(CRS_1)
The shapefile is assigned to the world
variable and loaded using GeoPandas' read_file()
method. It's too big to show here, but if you want to see the contents, enter world.head()
, just as you would for a pandas DataFrame.
The front
country will be displayed in red in the foreground of the display. The back
country will be displayed in the background in black and white.
The final two lines convert the shapefiles to the proper coordinate reference system, as specified by the EPSG codes.
Rotating and Shifting Boundary Polygons
The following code finds the centroid of each country polygon, rotates the front
(foreground) polygon, if necessary, and shifts it so that the two polygons have the same center point.
# Calculate the geographical center of each country:
front_center = front.geometry.centroid.iloc[0]
back_center = back.geometry.centroid.iloc[0]
# Rotate COUNTRY2 around center by ROTATE value (0 = no rotation):
front_rotated = front.set_geometry(front.rotate(ROTATE, origin='centroid'))
# Shift foreground country to overlap background country:
front_shifted = front_rotated.set_geometry(front_rotated.translate(
xoff=back_center.x-front_center.x,
yoff=back_center.y-front_center.y))
The ROTATE
constant lets you rotate the foreground country's polygon to improve its placement within the background polygon. Values should be between 0 and 180. Negative values will rotate the polygon clockwise.
It's also possible to shift the foreground polygon vertically and horizontally. We'll cover that option later in the section on US States.
Plotting the Maps and Checking the Areas
The next cell plots the maps and then prints the area of each country polygon as a quality control step. The latter helps ensure we're using viable EPSG codes.
# Plot the background country (COUNTRY1):
base = back.plot(color='white', edgecolor='black')
# Plot the foreground country (COUNTRY2):
front_shifted.plot(ax=base, color='red', alpha=0.5)
# Add a legend:
red_patch = mpatches.Patch(color='red', alpha=0.5, label=COUNTRY2)
plt.legend(handles=[red_patch], loc='best')
# Turn off axis labels and ticks:
plt.xticks([])
plt.yticks([])
# Add a title:
plt.title(f"{COUNTRY2} vs. {COUNTRY1} Size Comparison")
# Save plot (optional):
plt.savefig(f'{COUNTRY2} vs {COUNTRY1}.png', dpi=600)
plt.show();
# Print the country area beneath the map:
front['area_km2'] = front.area / 10**6
back['area_km2'] = back.area / 10**6
print(f"{COUNTRY2} area = {front.area_km2.to_string(index=False)} sq km")
print(f"{COUNTRY1} area = {back.area_km2.to_string(index=False)} sq km n")
The first two steps plot the background and foreground boundaries. We then add a legend. A key step here is to use Matplotlib's mpatches.Patch()
class to make a red rectangle to designate the foreground country.
We then turn off the x- and y-axis ticks, add a title, and save the plot.
Finally, we calculate the area of each polygon and print the results. We can then search online for each country area to see if the results match. They won't be exact, as the boundary polygons we're using are simplified, but they'll be close enough to ensure our EPSG codes aren't off the mark.
Here are the results for comparing Australia to China:

Australia area = 7.593688e+06 sq km
China area = 1.028958e+07 sq km
A Note on Plotting the United Kingdom
Our shapefile includes the Falkland Islands with the United Kingdom, which can mess up your centroid calculation. To fix this, remove the "Falkland Islands" under the "ADMIN" column, as so:
# Remove the Falkland Islands:
world = gpd.read_file('ne_110m_admin_0_countries.zip')
world = world[(world.ADMIN != 'Falkland Islands')]
Next, we'll compare US states to other states and countries.
The US State Comparison Code
We must download a new shapefile from the Natural Earth site to compare US states. Click the link highlighted in yellow below and move the file to the folder containing your Python script or notebook:

Remember, you don't need to unzip the file to use it.
Comparing States to States
Here's the full code for comparing one state to another (in this case, California vs. Texas). It's essentially the same as before but with a new shapefile and a few tweaks to variable names:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import geopandas as gpd
# Assign constants:
STATE1 = 'Texas' # Plots in the background (back).
STATE2 = 'California' # Plots in the foreground (front).
ROTATE = 13 # Direction and amount to rotate STATE2
CRS1 = 'EPSG:5070' # Coordinate Reference System for STATE1
CRS2 = 'EPSG:5070' # Coordinate Reference System for STATE2
# Load the shapefile:
usa = gpd.read_file('ne_110m_admin_1_states_provinces.zip')
# Select states from GeoDataFrame:
front = usa[(usa.name == STATE2)]
back = usa[(usa.name == STATE1)]
# Project to a CRS before calculating the centroid:
front = front.to_crs(CRS2)
back = back.to_crs(CRS1)
# Calculate the geographical center of each state:
front_center = front.geometry.centroid.iloc[0]
back_center = back.geometry.centroid.iloc[0]
# Rotate STATE2 around center by ROTATE value (0 = no rotation):
front_rotated = front.set_geometry(front.rotate(ROTATE, origin='centroid'))
# Shift foreground state to overlap background state:
front_shifted = front_rotated.set_geometry(front_rotated.translate(
xoff=back_center.x-front_center.x,
yoff=back_center.y-front_center.y))
# Plot the background state (STATE1):
base = back.plot(color='white', edgecolor='k', lw=3)
# Plot the foreground state (STATE2):
front_shifted.plot(ax=base, color='red', alpha=0.5)
# Add a legend:
red_patch = mpatches.Patch(color='red', alpha=0.5, label=STATE2)
plt.legend(handles=[red_patch], loc='best')
# Turn off axis labels and ticks:
plt.xticks([])
plt.yticks([])
# Add a title:
plt.title(f"{STATE2} vs. {STATE1} Size Comparison")
plt.savefig(f'{STATE2}_vs_{STATE1}.png', dpi=600)
plt.show();
# Print the area of each beneath the map:
front['area_km2'] = front.area / 10**6
back['area_km2'] = back.area / 10**6
print(f"{STATE2} area = {front.area_km2.to_string(index=False)} sq km")
print(f"{STATE1} area = {back.area_km2.to_string(index=False)} sq km n")

Comparing States to Countries
To compare a state (or states) to another country, we must load both the country and state shapefiles. Here's an example comparing Australia to the Lower 48:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import geopandas as gpd
# Assign constants:
COUNTRY1 = 'USA' # Plots in the background (back).
COUNTRY2 = 'Australia' # Plots in the foreground (front).
ROTATE = 0 # Direction and amount to rotate COUNTRY2
CRS_1 = 'EPSG:5070' # Coordinate Reference System for COUNTRY1
CRS_2 = 'EPSG:3112' # Coordinate Reference System for COUNTRY2
# Load the shapefiles:
world = gpd.read_file('ne_110m_admin_0_countries.zip')
usa = gpd.read_file('ne_110m_admin_1_states_provinces.zip')
# Exclude Alaska and Hawaii from USA:
usa = usa[(usa.name != 'Alaska') & (usa.name != 'Hawaii')]
# Select regions from GeoDataFrames:
front = world[world['SOVEREIGNT'] == COUNTRY2]
back = usa
# Project to a CRS before calculating the centroid:
front = front.to_crs(CRS_2)
back = back.to_crs(CRS_1)
# Calculate the geographical center of each region:
front_center = front.geometry.centroid.iloc[0]
back_center = back.geometry.centroid.iloc[0]
# Rotate front around center by ROTATE value (0 = no rotation):
front_rotated = front.set_geometry(front.rotate(ROTATE, origin='centroid'))
# Shift front to overlap back:
front_shifted = front_rotated.set_geometry(front_rotated.translate(
xoff=back_center.x-front_center.x,
yoff=back_center.y-front_center.y))
# Plot the background region:
base = back.plot(color='white', edgecolor='black')
# Plot the foreground region:
front_shifted.plot(ax=base, color='red', alpha=0.5)
# Add a legend:
red_patch = mpatches.Patch(color='red', alpha=0.5, label=COUNTRY2)
plt.legend(handles=[red_patch], loc='best')
# Turn off axis labels and ticks:
plt.xticks([])
plt.yticks([])
# Add a title:
plt.title(f"{COUNTRY2} vs. {COUNTRY1} Size Comparison")
plt.show();

There's a problem here. Australia is too far north for a good comparison. To fix this, we need to shift Australia downward.
Shifting Boundary Polygons Vertically and Horizontally
We've already shifted the foreground polygon by setting its centroid equal to the centroid of the background polygon. To further adjust it in the north-south and east-west directions, we can assign additional constants:
SHIFT_SOUTH = 550_000 # in meters
SHIFT_WEST = 350_000 # in meters
As with the ROTATE
constant, the values shown were derived through trial and error.
We then edit the line of code that shifts the front polygon to include the new constants:
front_shifted = front_rotated.set_geometry(front_rotated.translate(
xoff=back_center.x-front_center.x-SHIFT_WEST,
yoff=back_center.y-front_center.y-SHIFT_SOUTH))
This shifts Australia down 550 km and left 350 km:

This makes for a much better comparison.
Interestingly, when I lived in Melbourne in the early '90s, I didn't realize Australia was as big as the contiguous US. Then I flew from Melbourne in the southeast to Perth on the west coast. The flight took slightly over 4 hours, similar to a trip from Atlanta to San Diego. Some quick math (4 h x 500 mph) revealed I had flown over 2,000 miles. No wonder Australia is classified as a continent!
Summary
Python's GeoPandas library makes it easy to manipulate geospatial data to create interesting maps. GeoPandas' tabular GeoDataFrame builds on the popular pandas DataFrame by including a "Geometry" column that holds the coordinates of vector data such as polygons and lines. This makes GeoDataFrames "spatially aware."
The "Geometry" column is typically populated using a geospatial vector data format known as a shapefile. With a GeoDataFrame, you can use a coordinate reference system to project the geospatial data onto a flat map, rotate and shift the data, calculate areas, plot the data, and more.
Thanks!
Thanks for reading and please follow me for more Quick Success Data Science projects in the future.