Visualize Data Ranges with Matplotlib

Author:Murphy  |  View: 30017  |  Time: 2025-03-23 12:37:13
Hurricane from Space by Leonardo AI DreamShaper_v7 model

Plotting discrete data is straightforward; representing ranges of data is more involved. Fortunately, Python's matplotlib library has a built-in function, fill_between(), that lets you easily visualize data ranges. In this Quick Success Data Science project, we'll use it to benchmark the National Oceanic and Atmospheric Administration's annual hurricane outlook.

The Dataset

Every May, NOAA releases its "Atlantic Hurricane Outlook" report for the June-November hurricane season. These outlooks include predicted ranges for named storms, hurricanes, and major hurricanes (defined as Category 3 and higher). You can find an example report for 2021 here [1]. NOAA/National Weather Service data is provided by the US government as open data, free to use for any purpose.

In order to benchmark the accuracy of these forecasts, we'll use the annual hurricane season summaries provided by Wikipedia. These summaries provide the actual number of storms and hurricanes for each year. You can find the 2021 season entry here [2]. Wikipedia pages are provided under a CC BY-SA 4.0 license.

Wikipedia also includes lists for _La Niña and El Niño_ events [3][4]. These represent weather patterns that occur in the Pacific Ocean every few years. During La Niña years, the water in the eastern Pacific is colder than normal, cooling the air above it. The opposite occurs in El Niño years.

The La Niña pattern favors stronger hurricane activity in the Atlantic basin while El Niño suppresses hurricane development [5]. To check this, we'll also color-code our plot for these events.

For convenience, I've already compiled all this information for the years 2001–2022 and stored it as a CSV file in this Gist.

NOAA issues an updated hurricane forecast every August, so you need to take care when selecting data and referencing predictions. We'll be using the May outlooks.

Installing Libraries

We'll use pandas for data handling and matplotlib for plotting. Install them with either:

conda install matplotlib pandas

or

pip install matplotlib pandas

The Code

The following code was written in JupyterLab and is described by cell.

Importing Modules

Besides performing data analysis and plotting, we're going to make a custom marker to represent a hurricane. To do this, we'll need to import NumPy, Python's numerical analysis package, and a matplotlib module known as mpathused for working with polylines.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import pandas as pd

Loading the Dataset

The CSV file contains low and high values for both predicted hurricanes (H) and major hurricanes (MH). It also includes the actual number of hurricanes and major hurricanes, and whether or not the season fell in a La Niña or El Niño event. A transitional year is labeled as a "Weak Event."

df = pd.read_csv('https://bit.ly/44YgahT')
df.head(3)

Defining a Function to Draw a Hurricane Marker

While we could use a simple circle to post the actual number of hurricanes on our scatter plot, wouldn't a classic hurricane icon look so much better?

Unfortunately, matplotlib doesn't come with a hurricane marker. However, code to draw a hurricane marker was provided in a Stack Overflow answer which I've reproduced below (Stack Overflow content is cc-wiki licensed) [6].

This function uses matplotlib's mpath module, which returns a [](https://matplotlib.org/stable/api/path_api.html) object representing a series of line and curve segments. How this code works isn't important for this project, but if you want to see a detailed explanation, visit the Stack Overflow link at the start of the snippet.

# The following code was adapted from Stack Overflow:
# https://stackoverflow.com/questions/44726675/custom-markers-using-python-matplotlib
# Asked by: https://stackoverflow.com/users/5689281/kushal
# Answered by: https://stackoverflow.com/users/4124317/importanceofbeingernest

def get_hurricane_symbol():
    """Return a hurricane warning symbol as a matplotlib path."""
    # Create a numpy array for the symbol's coordinates and codes:
    coordinates = np.array([[2.444, 7.553],
                            [0.513, 7.046],
                            [-1.243, 5.433],
                            [-2.353, 2.975],
                            [-2.578, 0.092],
                            [-2.075, -1.795],
                            [-0.336, -2.870],
                            [2.609, -2.016]])

    # Shift the x-coordinates:
    coordinates[:, 0] -= 0.098

    # Define path codes:
    codes = [1] + [2] * (len(coordinates) - 2) + [2]

    # Duplicate and reverse the coordinates:
    coordinates = np.append(coordinates, -coordinates[::-1], axis=0)

    # Duplicate the codes:
    codes += codes

    # Create and return the matplotlib path:
    return mpath.Path(3 * coordinates, codes, closed=False)

Plotting Actual Hurricanes vs. Predicted Hurricanes

The code below uses the matplotlib [fill_between()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.fill_between.html) method to capture NOAA's predicted number of hurricanes for each year. It requires the DataFrame column name for the x argument, a minimum value for y1, and a maximum value for y2. Adding a label argument ensures that the range shading will be referenced in the legend.

# Call the function to build the hurricane marker:
symbol = get_hurricane_symbol()

# Initialize the figure:
plt.figure(figsize=(10, 4))

# Plot the actual number of hurricanes per year:
plt.plot(df.Year, df['Actual H'], 
         label='Actual Value', 
         marker=symbol, 
         markersize=17, 
         c='darkblue', 
         linestyle='None', 
         lw=1)

# Shade NOAA's predicted range of hurricanes for each year:
plt.fill_between(x=df.Year, 
                 y1=df['Predicted H Low'], 
                 y2=df['Predicted H High'],
                 alpha=0.3, 
                 label='Predicted Range')

plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)
plt.title('Actual Number of Atlantic Hurricanes vs. 
NOAA May Prediction (2001-2022)');

# Optional code to save the figure:
# plt.savefig('range_plot.png', bbox_inches='tight', dpi=600)
The range plot (by author)

This simple yet elegant plot is filled with useful information. For instance, over the last 22 years, the actual number of hurricanes has landed within the predicted ranges 11 times. This is the same accuracy as flipping a coin. Lately, NOAA has started using wider ranges which increases accuracy but decreases precision.

Changing the Fill Style

I really like the previous fill style for the predicted range, but there are alternatives. In this example, we pass a step argument to the fill_between() method. Now, instead of a continuous polygon, we get discrete vertical bars.

plt.figure(figsize=(10, 4))
plt.plot(df.Year, df['Actual H'], 
         label='Actual Value', 
         marker=symbol, 
         markersize=17, 
         c='darkblue', 
         linestyle='None', 
         lw=1)
plt.fill_between(x=df.Year, 
                 y1=df['Predicted H Low'], 
                 y2=df['Predicted H High'],
                 step='mid',
                 alpha=0.3, 
                 label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)
plt.title('Actual Number of Atlantic Hurricanes vs. 
NOAA May Prediction (2001-2022)');
The range plot using the "step" argument (by author)

Adding El Niño and La Niña Events

To evaluate the impact of El Niño and La Niña events on the number and intensity of hurricanes, let's make use of the "Event" column of the DataFrame.

First, we need to make a dictionary that maps the event to a color. Since La Niña represents a cooling event, we'll use blue. El Niño warming events will be red, and weak events will be nondescript grey.

We'll add a separate custom legend for the events just beneath the figure title. Note the use of $u25CF$ to draw circles. This is a symbol from the handy STIX font collection.

# Plot the predicted ranges and color the actual values by event.
# Define a dictionary to map text colors to matplotlib colors:
color_mapping = {'Nina': 'blue', 
                 'Nino': 'red', 
                 'Weak Event': 'grey'}
# Map the Event column to colors. Use black if x not found:
df['colors_mapped'] = df['Event'].apply(lambda x: color_mapping.get(x, 'k'))

plt.figure(figsize=(10, 4))
plt.scatter(df.Year, df['Actual H'], 
            label='Actual Value', 
            marker=symbol, 
            s=300, 
            c=df.colors_mapped, 
            linestyle='None', 
            lw=1)
plt.fill_between(x=df.Year, 
                 y1=df['Predicted H Low'], 
                 y2=df['Predicted H High'], 
                 alpha=0.3, 
                 label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)

# Add event legend as title:
plt.suptitle('Actual Number of Atlantic Hurricanes vs. NOAA May Prediction (2001-2022)')
plt.figtext(0.4, 0.9, '$u25CF$ La Nina', fontsize='medium', c='b', ha ='right')
plt.figtext(0.5, 0.9, '$u25CF$ El Nino', fontsize='medium', c='r', ha ='center')
plt.figtext(0.6, 0.9, '$u25CF$ Weak Event', fontsize='medium', c='grey', ha ='left');
Range plot with markers colored by weather event (by author)

These results appear to support the theory that El Niño events suppress hurricane formation in the Atlantic, at least versus La Niña events. To see if they also impact hurricane intensity, let's plot the major hurricane data.

Plotting Major Hurricanes

Major hurricanes are defined as those rated Category 3 or higher. The following code updates the plot for these values.

plt.figure(figsize=(10, 4))
plt.scatter(df.Year, df['Actual MH'], 
            label='Actual Value', 
            marker=symbol, s=300, 
            c=df.colors_mapped, 
            linestyle='None', 
            lw=1)
plt.fill_between(x=df.Year, 
                 y1=df['Predicted MH Low'], 
                 y2=df['Predicted MH High'], 
                 alpha=0.3, 
                 label='Predicted Range')
plt.xlabel('Year')
plt.ylabel('Number of Major Hurricanes (Cat 3+)')
plt.legend(loc='lower right')
plt.grid(True, c='lightgrey', alpha=0.5)

# Add event legend as title:
plt.suptitle('Actual Number of Major Atlantic Hurricanes vs. NOAA May Prediction (2001-2022)')
plt.figtext(0.4, 0.9, '$u25CF$ La Nina', fontsize='medium', c='b', ha ='right')
plt.figtext(0.5, 0.9, '$u25CF$ El Nino', fontsize='medium', c='r', ha ='center')
plt.figtext(0.6, 0.9, '$u25CF$ Weak Event', fontsize='medium', c='grey', ha ='left');
Range plot with major hurricanes color-coded by weather event (by author)

With the exception of 2004, which some sources classify as a weak event, this chart supports the idea that hurricane formation is suppressed during El Niño events [7]. Forecast accuracy is also slightly better for major hurricanes, with 13 of 22 falling within the predicted range.

Drawing Ranges Using Vertical Lines

Another way to plot ranges is to use matplotlib's vlines() method to draw vertical lines. This is an attractive alternative to the fill_between() method, though it's more labor-intensive and doesn't automatically include the range in the legend.

# Redraw plot with vertical lines for ranges:
plt.figure(figsize=(10, 4))

# Use a scatter plot for actual values:
plt.scatter(df.index, df['Actual H'], 
            label='Actual Value', 
            marker=symbol, 
            c='darkblue', 
            s=350)

# Draw vertical lines for the predicted ranges:
for i, row in df.iterrows():
    plt.vlines(x=i, 
               ymin=row['Predicted H Low'], 
               ymax=row['Predicted H High'], 
               alpha=0.4, 
               lw=6, 
               zorder=0)

x = range(len(df))
plt.xticks(x, df.Year, rotation=90)
plt.xlabel('Year')
plt.ylabel('Number of Hurricanes')
plt.legend(loc='lower right')
plt.grid(True, color='lightgray', alpha=0.5)
plt.title('Actual Number of Atlantic Hurricanes vs. NOAA May Prediction');
Range plot with ranges expressed as vertical lines (by author)

Evaluating the Atlantic Multidecadal Oscillation

We've now covered the fill_between() method, but since we've got all this data at hand, let's take a moment to examine an interesting theory on hurricane formation involving the Atlantic Multidecadal Oscillation (AMO) [8].

The AMO is a feature defined by decades-long variability in North Atlantic Sea surface temperatures. Little is known about the AMO; it may represent a persistent periodic climate driver or just a transient feature [9].

The AMO index is calculated by subtracting the global mean sea surface temperature (SST) anomalies from the North Atlantic SST anomalies [9]. When the AMO index is high, sea surface temperatures are warmer than usual, potentially contributing to increased hurricane activity and intensity.

Because this is a long-wavelength phenomenon, we'll need a database that counts hurricanes back to 1920 or so. I've already recorded Wikipedia's hurricane list for this timeframe and stored it at this Gist.

It should be noted that storm counts before the use of airplanes (in the mid-1940s) and satellite data (in the mid-1960s) are less reliable. For example, count estimates between 1886 and 1910 are believed to have an undercount bias of zero to four storms per year [10].

In the following plot, the AMO index boundaries are taken from Wikipedia and NOAA [8][11].

# Load the 1920-2022 hurricane dataset:
df = pd.read_csv('https://bit.ly/3sZnvQX')

# Plot major hurricanes per year with regression line and AMO shading:
plt.figure(figsize=(10, 4))

plt.plot(df.Year, df.MH, 
         label='Actual Value', 
         marker=symbol, 
         markersize=17, 
         c='darkblue', 
         linestyle='None', 
         lw=1)

plt.xlabel('Year')
plt.xticks(range(1920, 2021, 10))
plt.ylabel('Number of Major Hurricanes (Cat 3+)')
plt.grid(True, c='lightgrey', alpha=0.5)
plt.title('Number of Major Atlantic Hurricanes by Year 1920-2022', 
          fontsize=18)

# Add a shaded span for AMO highs:
plt.text(1940, 6.5, 'AMO High', c='firebrick')
plt.axvspan(1926, 1964, 
           color='red', 
           alpha=0.2)

plt.text(2005, 6.5, 'AMO High', c='firebrick')
plt.axvspan(1995, 2022, 
           color='red', 
           alpha=0.2)

# Calculate m (slope) and b (intercept) of linear regression line:
m, b = np.polyfit(df.Year, df.MH, 1)

# Add linear regression line to plot:
plt.plot(df.Year, m*df.Year+b, c='darkblue', ls=':');
Major hurricanes per year including regression line and AMO index high periods (by author)

Here's the same data presented as a bar chart:

Bar chart of major hurricanes with AMO index high periods (by author)

And here's the scatterplot for all Atlantic hurricanes over this time period. The AMO effect is less obvious for the frequency of storms.

Total hurricanes per year including regression line and AMO index high periods (by author)

Although scientists recognize the apparent relationship between the AMO index and the number of major hurricanes, there's not enough data at present to draw firm conclusions. As you might expect, the most popular explanation for the increase in major hurricanes in the most recent AMO high is anthropogenic Climate Change.

Summary

The matplotlib fill_between() method is a handy way to display a range of values on a plot. In this project, we used it to show NOAA's annual Hurricane forecasts versus the actual outcomes. In addition, we used matplotlib's mpath module to draw a custom marker to represent hurricanes. The result was an attractive and easy-to-parse infographic.

We also added El Niño, La Niña, and AMO events to our plots. The results supported established observations that El Niño seems to suppress Atlantic hurricanes, and high AMO index events seem to promote them.

Citations

  1. Climate Prediction Center Internet Team, 2001, "NOAA 2021 Atlantic Hurricane Season Outlook," Climate Prediction Center – Atlantic Hurricane Outlook (noaa.gov)
  2. Wikipedia contributors, "2021 Atlantic hurricane season," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=2021_Atlantic_hurricane_season&oldid=1175731221 (accessed September 19, 2023).
  3. Wikipedia contributors, "El Niño," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=El_Ni%C3%B1o&oldid=1174548902 (accessed September 19, 2023).
  4. Wikipedia contributors, "La Niña," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=La_Ni%C3%B1a&oldid=1174382856 (accessed September 19, 2023).
  5. Bell, Gerry, 2014, "Impacts of El Niño and La Niña on the hurricane season," NOAA Climate.gov, Impacts of El Niño and La Niña on the hurricane season | NOAA Climate.gov.
  6. ImportanceOfBeingErnest, "Custom Markers using Matplotlib," Stack Overflow, June 24, 2017, Custom markers using Python (matplotlib) – Stack Overflow (accessed September 19, 2023).
  7. Null, Jan, 2023, "El Niño and La Niña Years and Intensities," Golden Gate Weather Services, El Niño and La Niña Years and Intensities (ggweather.com) (accessed September, 19, 2023).
  8. Wikipedia contributors, "Atlantic multidecadal oscillation," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Atlantic_multidecadal_oscillation&oldid=1175329341 (accessed September 19, 2023).
  9. Knudsen, M., Seidenkrantz, MS., Jacobsen, B. et al., "Tracking the Atlantic Multidecadal Oscillation through the last 8,000 years," Nat Commun 2, 178 (2011). https://doi.org/10.1038/ncomms1186.
  10. Wikipedia contributors, "List of Atlantic hurricane records," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=List_of_Atlantic_hurricane_records&oldid=1168214070 (accessed September 19, 2023).
  11. NOAA, 2017, "Atlantic Multidecadal Oscillation Low-Frequency Climate Mode," Atlantic Oceanographic and Meteorological Laboratory, Gulf of Mexico ESR (noaa.gov).

Thanks!

Thanks for reading. My goal is to help you hone your Python skills and have fun doing it. Follow me for more Quick Success Data Science projects in the future.

Tags: Climate Change Data Vizualization Hands On Tutorials Hurricane Matplotlib

Comment