Precisely Compare Geographical Regions with GeoPandas
QUICK SUCCESS DATA SCIENCE

I previously wrote an article on how to compare the sizes of geographical regions, such as countries and states, by overlaying their maps. Here's an example using the state of Texas and the country of Ukraine:

This approach is effective, except when regions have significantly different shapes. For instance, in the following figure, which is larger: Chile or Texas?

This issue led Medium member Anurag to leave the following (paraphrased) comment on the article:
"Is it possible to fill the area of other countries with the area of Texas? Unlike just overlapping the two countries on top of each other. I think it would be easier to observe and understand."
This was an intriguing suggestion, and the answer is yes. You can do it. Here's an example where Chile, the larger region, is ‘filled' with the area of Texas:

Now there's no doubt that Chile is larger than Texas. This is a handy method for when the shapes of the two regions are very different or the areas are very similar.
In this Quick Success Data Science project, I'll show you how to use Python to fill the map of one region with the area of another.
The Process
Here's the (very) high-level process we'll use:
- Get the shapefiles (polygons) of the two regions
- Calculate their areas
- Plot the boundary (map) of the largest region
- Fill the map with a colored polygon
- Whittle away at the polygon until it's the size of the smaller region
We'll use shapefiles and Python's Geopandas library to accomplish these tasks. For an overview of shapefiles and GeoPandas, including installation instructions, please visit my original article:
The Code
The following code takes as input a Shapefile of a US state and a country, determines which is the largest, and plots the area of the smaller region inside the map of the larger region, filling it with color from left to right. While this should be appropriate for most countries, we'll also look at a bottom-to-top filling approach for vertically oriented countries and states like Chile and California.
Importing Libraries
GeoPandas, like pandas, is built on Matplotlib, so we'll import Matplotlib for plotting. The shapely library will help us slice away the color-filled area until it matches the smaller region. Both Matplotlib and shapely are dependencies of GeoPandas, and are installed automatically with GeoPandas.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import geopandas as gpd
from shapely.geometry import box
Assigning Constants
For convenience, we'll assign constants at the program's top, where they'll be easy to find and change. These include the state and country name, a coordinate reference system (CRS) for projecting from a 3D globe to a 2D map, and an integer that we'll use to set the width of the slices we'll shave from the color-filled polygon. The larger this number, the finer slice, and the more accurately we can match the area of the smaller region.
# Constants
STATE = 'Texas'
COUNTRY = 'Chile'
CRS_STATE = 'EPSG:5070' # Coordinate Reference System
CRS_COUNTRY = 'EPSG:20048' # Coordinate Reference System
DIVISOR = 1_000 # Larger values increase accuracy and run time
See the original article referenced previously for more on coordinate reference systems and EPSG codes.
HINT: To find the EPSG code for a country, search online using queries such as, "What is the recommended EPGS code for Australia?".
Selecting and Projecting Shapefiles
GeoPandas used to ship with the Natural Earth public domain dataset, which includes shapefiles for country and state boundaries. This was deprecated in early 2024, so we've got to download the file manually.
Fortunately, this is easy. Just navigate to this site and click the download link highlighted in yellow below to get the state boundaries :

To get the country boundaries, click the highlighted link below:

Move the zipped files into the folder containing your Python script or notebook, then run the following code:
# Download the zipped files used below from:
# https://www.naturalearthdata.com/downloads/110m-cultural-vectors/
world = gpd.read_file('ne_110m_admin_0_countries.zip')
usa = gpd.read_file('ne_110m_admin_1_states_provinces.zip')
# Rename columns for consistency
world = world.rename(columns={'ADMIN': 'name'})
# Create GDFs for each region
state = usa[usa.name == STATE]
country = world[world['SOVEREIGNT'] == COUNTRY]
# Project to CRS
state = state.to_crs(CRS_STATE)
country = country.to_crs(CRS_COUNTRY)
The country shapefile is assigned to the world
variable and loaded using GeoPandas' read_file()
method. It's too big to show here, but if you want to see the contents, enter world.head()
, just as you would for a pandas DataFrame.
The world
and usa
GeoDataFrames use different column names for the name of the country and state, respectively. This will prove inconvenient later, so we'll set the world
column name ("ADMIN") to "name".
Next, we create the state and country GeoDataFrames based on the STATE
and COUNTRY
constants.
The final two lines convert the shapefiles to the proper coordinate reference system, as specified by the EPSG codes.
Determining the Largest and Smallest Regions
Now we calculate the area of each region and determine which is the biggest and smallest.
# Calculate areas
area1 = state.geometry.area.sum()
area2 = country.geometry.area.sum()
# Determine biggest and smallest regions
biggest, smallest = ((state, country) if area1 >= area2
else (country, state))
What's happening here? Well, the state.geometry.area.sum()
line accesses the geometry
column of the state
GeoDataFrame, which contains the geometric shapes (polygons) representing the state.

The area
property calculates the area of each polygon in the geometry
column. The result is a GeoSeries of area values. The sum()
method then sums up the areas in the GeoSeries to get the total area of the geometries in the state
GeoDataFrame.
The total area of the geometries will be in the units of the CRS. In this case, we're using square meters.
Filling the Map with the Smaller Area
Next, we fill the larger area map with a form-fitting colored polygon. We then iteratively slice away the colored polygon until its area matches the area of the smaller region, represented by the target_area
variable.
# Create filled area
filled_area = biggest.unary_union
# Define target area
target_area = smallest.geometry.area.sum()
# Remove slices until the remaining area matches target
remaining_area = filled_area
while remaining_area.area > target_area:
minx, miny, maxx, maxy = remaining_area.bounds
slice_width = (maxx - minx) / DIVISOR
slice_to_remove = box(maxx - slice_width, miny, maxx, maxy)
remaining_area = remaining_area.difference(slice_to_remove)
# Convert remaining area to GeoDataFrame
remaining_area_gdf = gpd.GeoDataFrame({'geometry': [remaining_area]},
crs=biggest.crs)
First, we create a polygon the same size and shape as the map of the biggest region. The unary_union
property merges all the geometries in the biggest
GeoDataFrame into one single geometry, which is then assigned to the variable filled_area
. The target_area
represents the area of the smallest region.
The remaining_area
variable is a copy of the filled_area
that we can modify until its area matches the target_area
value.
The while
loop is the secret sauce that makes all this possible. It runs as long as the remaining area exceeds the target area. The loop body consists of the following steps:
remaining_area.bounds
retrieves the bounding box coordinates (minx, miny, maxx, maxy) of the currentremaining_area
.slice_width
is calculated as a fraction of the width of the bounding box, determined by theDIVISOR
constant.slice_to_remove
creates a vertical rectangular slice (usingshapely.geometry.box
) at the right edge of the bounding box using the width stored inslice_width
.remaining_area.difference(slice_to_remove)
subtracts theslice_to_remove
fromremaining_area
, effectively removing a small slice from the right edge.
Here's a view of the biggest, remaining area, and bounding box polygons at the start of the loop:

Once the loop conditions are met, the remaining_area
is converted into a GeoDataFrame (remaining_area_gdf
) with the same CRS as biggest
. Now we're ready to plot.
Plotting the Results
We'll use Matplotlib to overlay the map of the biggest region with the remaining area polygon.
# Plot
fig, ax = plt.subplots(figsize=(10, 10))
biggest.boundary.plot(ax=ax, color='black')
remaining_area_gdf.plot(ax=ax, color='red', alpha=0.5)
# Add legend
smallest_name = smallest['name'].iloc[0]
red_patch = mpatches.Patch(color='red',
alpha=0.5,
label=smallest_name)
plt.legend(handles=[red_patch], loc='best')
# Customize plot
plt.xticks([])
plt.yticks([])
plt.title(f"{biggest['name'].iloc[0]} vs. {smallest_name} Area Comparison")
plt.show()
The biggest.boundary
code accesses the boundary
property of the biggest
GeoDataFrame. This property returns a GeoSeries containing the outlines of each geometry in the GeoDataFrame, which forms the outline of Texas.
The Patch
class helps us manually build a legend with a rectangular patch of color attached to a label.
Here's the output, as shown previously:

The area of the pink-colored region is equivalent to the area of Texas.
Given Chile's aspect ratio, filling it from bottom to top might look better. To switch the fill direction, change the slice_height
and slice_to_remove
lines in the while
loop as below:
slice_height = (maxy - miny) / DIVISOR
slice_to_remove = box(minx, maxy - slice_height, maxx, maxy)
Now, the color-fill extends from bottom to top, like water filling a pipette:

Fine-tuning the Results
Remember our DIVISOR
constant? This value sets the width of the slices removed from the remaining_area
polygon with each iteration of the while
loop. If this value is too small, the slices will be thick, and we won't be able to reproduce the target area. If it's too large, the program might run for an unnecessarily long time.
To check the performance of the constant, we'll print out the final areas of the target value and the remaining area polygon:
# Check smallest vs. remaining area
print(f"Smallest area: {target_area:,} sq. meters")
print(f"Remaining area: {remaining_area.area:,} sq. meters")
Smallest area: 691,287,205,135.4048 sq. meters
Remaining area: 691,007,387,015.6472 sq. meters
That's pretty close. If you want more precision, change the DIVISOR
constant to a larger number, such as 1500 or 2000.
Summary
To accurately compare the sizes of two geographical regions, we used GeoPandas and shapefiles to fill the map outline of one region with a colored polygon representing the smaller region. The example code was designed to compare U.S. states to other countries but can be easily adapted for state-to-state or country-to-country comparisons.
If you found this article useful and would like to support me, please buy me a coffee. I promise I'll drink it.

Thanks!
Thanks for reading and clapping and please follow me for more Quick Success Data Science projects in the future.