Analyzing Lake Mendota Ice Phenology with Python

Author:Murphy | View: 21288 | Time: 2025-03-22 21:18:56

Lake Mendota in Madison Wisconsin is arguably one of the most-studied lakes in the world and boasts an ice phenology record dating back to the 1850's. The Climatology Office at the University of Wisconsin, which currently maintains the record, has made amazing strides to make the data downloadable. However, a simple plot of the ice-on date over time still requires some work by the user to make it legible. This is the reason I created mendotapy, an open source (MIT-licensed) Python package that allows retrieval of this valuable data in an analysis-ready format. It has some built-in utilities to make common transformations a breeze.

I find this dataset interesting from a climate perspective, but I think anybody can find this dataset relevant. For instance, it could be used as a toy dataset to practice regression analysis, statistical forecasting, or spectral analysis.

In this post, I will describe the package, present some plots of the data, and conclude with the challenges I faced making this package. All the code to generate the figures is available in this post and the package code is available on GitHub (contributions are welcome)

Installation and using the package

First, you will need to install the package via pip.

pip install mendotapy

Once installed, it only takes two lines to retrieve an analysis-ready copy of Lake Mendota's ice phenology returned as a Pandas DataFrame.

import mendotapy
df = mendotapy.retrieve_data()

The beauty of this is all the pre-processing has been done for you so you can you jump right into analysis! Figure 1 shows a screenshot of the dataframe.

Figure 1. screenshot of dataframe (image by author)

You will notice that some seasons are duplicated (e.g. 2018–19). These indicate years where the lake thawed twice in a single season. However, the climatology office only provides the duration for the most recent freeze/thaw cycle in the season.

I have also added latitude , longitude , lakename , and lakecode to the dataset for consistency with the National Snow and Ice Data Center's (NSIDC) Global Lake and Rive Ice Phenology. The latter is a great resource for ice phenology data from over 500 lakes and rivers. Although, at the time of writing, the database contains a temporal coverage up to February 2nd, 2018. mendotapy gives you access to the most recent data for Lake Mendota.

Dataframe Accessor

What if you just want the start of the season? For instance, the season is 2023-24 but you just want 2023? Or what if you want the day of the year (DOY) for the ice on date? To facilitate this sort of functionality, a dataframe accessor was added to the package.

This additional functionality is within an accessor named utils

df.utils.season_start() # returns list of the start of the season

df.utils.iceon_doy()    # returns list of ice on day of year
df.utils.iceoff_doy()   # returns list of ice off day of year

This makes plotting the ice on DOY like this:

import matplotlib.pyplot as plt

season_start = df.utils.season_start()
iceon_doy = df.utils.iceon_doy()

plt.plot(season_start, iceon_doy)

Which produces a the graph shown in Figure 2.

Figure 2: ice-on day of year versus time (image by author)

Why does this look so spiky? In most seasons, the ice-on date happens in December, but sometimes it occurs in January if the summer is unusually warm that season. One way to improve this plot is to "wrap" the DOY values into the following year. In other words, add 365 to DOYs occurring in January. to do this in mendotapy , simply use the df.utils.iceon_doy_wrapped() method. So the new code looks like

import matplotlib.pyplot as plt

season_start = df.utils.season_start()
iceon_doy_wrapped = df.utils.iceon_doy_wrapped()

plt.plot(season_start, iceon_doy_wrapped)

Applying this trick, we can clearly see the upward trend in the data

ice-on day of year versus time. In this figure, days of the year greater than 365 or 366 indicate the following year (image by author)

How to make an accessor

Making an accessor like this was new for me and opens up a whole world of possibilities with Pandas (and Xarray too!). Let's dive into how these accessors work.

My approach was to start with a function I know works on a DataFrame. For instance, the function below will return a list with the start of the season.

def season_start(df):
        """start of the season

        Returns
        -------
            list: start of the season.
                  e.g., the 2023-2024 winter season return 2023
        """
        season_list = []
        for index, row in df.iterrows():
            season = int(row["season"].split("-")[0])
            season_list.append(season)
        return season_list

You can then apply the function to a DataFrame.

import mendotapy

# load raw data
df = mendotapy.load()

# apply season_start function
start_of_season = season_start(df)

To turn this into an accessor and use the function like this df.utils.seasons_start() we need to to take the following steps:

Make the function into a class method (class name is irrelevant)
Initialize the class with a DataFrame
Decorate the function to register an accessor

Here is the code that makes this work in mendotapy

@pd.api.extensions.register_dataframe_accessor("utils")
class MendotapyAccessor:
    def __init__(self, df):
        self._df = df

    def season_start(self):
        """start of the season

        Returns
        -------
            list: start of the season.
                  e.g., the 2023-2024 winter season return 2023
        """
        season_list = []
        for index, row in self._df.iterrows():
            season = int(row["season"].split("-")[0])
            season_list.append(season)
        return season_list

The secret sauce that makes all this work is the decorator. The name you pass as an argument will be the name of the accessor.

@pd.api.extensions.register_dataframe_accessor("utils")

Accessors are great for grouping related methods or properties together. However, for mendotapy I took the naive route and dumped all the methods into a single accessor named utils .

Plotting and Analyzing the Data

This is a fun dataset to play around with and one of my go-to-datasets when explaining Climate Change and global warming. It was even featured in a 2018 data visualization battle (r/dataisbeautiful DataViz battle).

Although I did not participate in the battle, I took a stab a making my own visualization. Whether it's "beautiful" or not is still up for debate. My goal was to make a compact, and informative, graphic conveying as much relevant information as possible. Here is what I came up with. Code used to generate this figure is at the bottom of this post.

Summary diagram: Left shows the duration of ice cover in days for each year. Darker circles indicate years where the lake thawed and refroze during the season. Black line indicates the linear trend. Top right shows the ice-off date and bottom right shows the ice-on date (image by author)

You will notice a distinct downward trend in the duration of ice cover, due to the ice-on date occurring later in the season and the ice-off date occurring earlier. This trend is mainly the result of human-caused warming, which has a more dramatic impact on the winter months. See the blog post below for a more detailed explanation of this.

Climate Change in Madison, WI

And for another wintery climate related story, check out this post discussing snowfall trends in Minnesota.

Does record snowfall disprove global warming?

Code used to generate the summary diagram

The code below was used to generate the summary diagram above. I am providing it here in the hope that it serves as a useful reference to help you enhance your plotting skills and create your own insightful visualizations.

from collections import defaultdict as defaultdict
from datetime import datetime

import Pandas as pd
import numpy as np
from sklearn import linear_model as linear_model
import matplotlib as matplotlib
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator

import mendotapy

#========================
# load the data
#========================
df = mendotapy.load()

#========================
# create figure and axes
#========================
fig = plt.figure(figsize=(12, 6))
gs = matplotlib.gridspec.GridSpec(nrows=2, ncols=2,  width_ratios=[1, 1])
ax0 = fig.add_subplot(gs[:, 0])
ax1 = fig.add_subplot(gs[0, 1])
ax2 = fig.add_subplot(gs[1, 1])

#========================
# duration plot
#========================
ax = ax0

x = df.utils.season_start() 
y = df['duration']

ax.scatter(x, y,
           marker='o',
           s=100,
           linewidth=3,
           color=(1, 1, 1, 1),
           edgecolor=[0.7, 0.7, 0.7],
           clip_on=False,
           zorder=2,
           )

# darker edgecolor if thawed during season
duplicates = df['season'].value_counts()[df['season'].value_counts() > 1].index
filt = df['season'].isin(duplicates)
x_thaw = df[filt].utils.season_start() 
y_thaw = df.loc[filt, 'duration']

ax.scatter(x_thaw, y_thaw,
           marker='o',
           s=100,
           linewidth=3,
           color=(1, 1, 1, 1),
           edgecolor=[0.3, 0.3, 0.3],
           clip_on=False,
           zorder=2,
           )

# Hide the right and top spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)

# set axis limits
ax.set_ylim([0, 170])
ax.set_xlim([1855, 2022])

# tick marks
ax.set_xticks(np.arange(1860, 2021, 40))
ax.set_yticks(np.arange(14, 180, 14))

# major/minor tick lines
ax.grid(axis='y', which='major', color=[0.8, 0.8, 0.8], linestyle='-')

# make labels
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)

# Turn off the display of all ticks.
ax.tick_params(axis='both',
               which='major',  # Options for both major and minor ticks
               top='off',  # turn off top ticks
               left='off',  # turn off left ticks
               right='off',  # turn off right ticks
               bottom='off',  # turn off bottom ticks
               length=0,
               )

# annotations
title_str = """Duration of Ice Cover (Days)"""
ax.set_title(title_str, fontsize=18, fontweight='bold', ha='center')

x_new = []
y_new = []
for xi, yi in zip(x,y):
    if yi is not None:
        x_new.append(xi)
        y_new.append(yi)

# linear regression
regr_out = linear_model.LinearRegression()
regr_out.fit(np.array(x_new).reshape(-1,1), np.array(y_new).reshape(-1,1))

# plot trend line
ax.plot(x_new, regr_out.predict(np.array(x_new).reshape(-1,1)), color='k', linewidth=3)

# Annotate the plot with the equation of the trend line
trend_per_decade = f"trend: {(regr_out.coef_).item() * 10:.0f} days / decade"
plt.annotate(trend_per_decade, xy=(-0.9, 2.05), xycoords='axes fraction', fontsize=12, color='k')

#========================
# iceoff plot
#========================

ax = ax1

ax.plot(df.utils.season_start(),
        [result if result is None else result + 365 for result in df.utils.iceoff_doy_wrapped()], color=[0.5, 0.5, 0.5], linewidth=2)

# Range ov axes
ax.set_ylim([340, 492])
ax.set_xlim([1855, 2022])

yticks_start_end_step = (340+7, 500, 14)
xticks_start_end_step = (1860, 2021, 40)

# Turn off the display of all ticks.
ax.tick_params(which='both',  # Options for both major and minor ticks
               top='off',  # turn off top ticks
               left='off',  # turn off left ticks
               right='off',  # turn off right ticks
               bottom='off')  # turn off bottom ticks

# Remove x tick marks
plt.setp(ax.get_xticklabels(), rotation=0)

# Hide the right and top spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)

# major/minor tick lines
ax.grid(axis='y', which='major', color=[0.8, 0.8, 0.8], linestyle='-')

# Turn off the display of all ticks.
ax.tick_params(axis='both',
               which='major',  # Options for both major and minor ticks
               top='off',  # turn off top ticks
               left='off',  # turn off left ticks
               right='off',  # turn off right ticks
               bottom='off',  # turn off bottom ticks
               length=0,
               )

# Only show ticks on the left and bottom spines
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')

# Don't allow the axis to be on top of your data
ax.set_axisbelow(True)
ax.set_xticks(np.arange(*xticks_start_end_step))
ax.set_yticks(np.arange(*yticks_start_end_step))

ax.set_yticklabels([datetime.fromordinal(doy).strftime('%b-%d')
                    for doy in range(*yticks_start_end_step)])

# Labels
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16, labelleft=False, labelright=True)

ax.xaxis.set_ticklabels([])

# Annotations
title_str = """Ice-off date"""
ax.set_title(title_str, fontsize=18, fontweight='bold', ha='center')

#========================
# iceon plot
#========================
ax = ax2
ax.plot(df.utils.season_start(),
        df.utils.iceon_doy_wrapped(), color='k', linewidth=2)

# Range ov axes
ax.set_ylim([320, 440])
ax.set_xlim([1855, 2022])

yticks_start_end_step = (320+7, 440, 14)
xticks_start_end_step = (1860, 2021, 40)

# Turn off the display of all ticks.
ax.tick_params(which='both',  # Options for both major and minor ticks
               top='off',  # turn off top ticks
               left='off',  # turn off left ticks
               right='off',  # turn off right ticks
               bottom='off')  # turn off bottom ticks

# Remove x tick marks
plt.setp(ax.get_xticklabels(), rotation=0)

# Hide the right and top spines
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)

# major/minor tick lines
ax.grid(axis='y', which='major', color=[0.8, 0.8, 0.8], linestyle='-')

# Turn off the display of all ticks.
ax.tick_params(axis='both',
               which='major',  # Options for both major and minor ticks
               top='off',      # turn off top ticks
               left='off',     # turn off left ticks
               right='off',    # turn off right ticks
               bottom='off',   # turn off bottom ticks
               length=0,
               )

# Only show ticks on the left and bottom spines
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')

# Don't allow the axis to be on top of your data
ax.set_axisbelow(True)

ax.set_xticks(np.arange(*xticks_start_end_step))
ax.set_yticks(np.arange(*yticks_start_end_step))

ax.set_yticklabels([datetime.fromordinal(doy).strftime('%b-%d')
                    for doy in range(*yticks_start_end_step)])

# Labels
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16, labelleft=False, labelright=True)

# Annotations
title_str = """Ice-on date"""
ax.set_title(title_str, fontsize=18, fontweight='bold', ha='center')

# save figure
fig.savefig("./lake-mendota.png", bbox_inches='tight', dpi=300)

Challenges making the plot

The trickiest part about making this figure was displaying the date as tick marks. I am sure there is a more elegant solution, but my approach was to format ordinal dates:

Python">yticks_start_end_step = (340+7, 500, 14)
yticks = range(*yticks_start_end_step)

list_of_dates = [datetime.fromordinal(doy).strftime('%b-%d') for doy in yticks]

ax.set_yticklabels(list_of_dates)

datetime(year, month, day).timetuple().tm_yday

The datetime.fromordinal() function uses the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. I then format these as a date. Although I may be off by a day in leap years, I think this approach should suffice since it is for visualization purposes only.

Conclusions

Creating the mendotapy package was a fun and educational experience, allowing me to deepen my understanding of the datetimepackage and enhance my plotting skills. This post only scratches the surface of what can be done with this rich dataset. Now that mendotapy makes it easy to retrieve and work with the data, I encourage you to dive deeper, use it as a toy dataset for statistical analyses, and share your insights. Contributions and feedback are always welcome on GitHub, and I look forward to seeing how this data sparks your curiosity and innovation.

Tags: Climate Change Climate Data Science Hands On Tutorials Pandas Python