Cyclical Encoding: An Alternative to One-Hot Encoding for Time Series Features

Author:Murphy | View: 22697 | Time: 2025-03-22 21:54:49

When it comes to training a machine learning model for time series, more often than not you will be looking to use the following time features:

Hour
Day of week
Month
Week or day of year
Etc.

It's fairly easy to transform your timestamp column into these kinds of features. After ensuring that you've cast your time column into a datetime object (using pd.to_datetime), you can extract a bunch of time series features using .dt.

df['Hour']=df['Datetime'].dt.hour
df['Month']=df['Datetime'].dt.month
df['Dayofweek']=df['Datetime'].dt.dayofweek

For reference, the dataset (CC0 public domain license) I'll be working with for this example is an hourly electric consumption dataset. Energy consumption datasets are typically time series and the goal is ultimately to forecast consumption in the future using past data, so this is a great use case. Though other external features, such as temperature, humidity and windspeed can also impact energy consumption, here I will be focusing on extracting and transforming time series features.

Cool — now you have gone from having essentially 0 usable features to 3.

But not so fast! As we know, when it comes to ML, we cannot pass these features into our model as is. Most models, if given these inputs, will interpret them as numerical features – and the reality is that time series features are not numerical, they are categorical.

For instance, when it comes to energy consumption, there are certain peak hours of the day where higher consumption is more likely. There are also specific hours which tend to have lower consumption. Each hour is in a sense its own category.

Zooming in on specific parts of this dataset showcases this. There are clear consumption patterns throughout the day – the usage tends peak around the same hour (5–6PM) and is lowest at 5–7AM.

Obviously, these patterns have complex interactions with the other features, such as the time / month of year and day of week, which is why we try to include as much information as we can into our model.

So how can we do this? If you are like most people, you learned early on that categorical features need to be encoded in some other format in order for the model to properly understand what they are. The most well known way to do this is one-hot encoding.

One-hot encoding is simple and straightforward to implement. What it basically does, is ask, for any given hour of the day (or month, day, etc), "is it hour/day/month n"? And answers this using a binary 0 or 1 variable. It does this for every kind of category. So, for 1 day_of_week original feature you will have 7 encoded features (representing 7 days in the week):

is_day_1
is_day_2
is_day_3
is_day_4
is_day_5
is_day_6
is_day_7

In Python, the easiest way to do this is by using pd.get_dummies:

columns_to_encode = ['Hour', 'Month', 'Dayofweek']

df = pd.get_dummies(df, columns=columns_to_encode)

This will produce the new feature set.

As you can see, that is a ton of features. We went from having 3 columns (hour, month, day of week) to over 40. This can become increasingly messy, as you add more and more time series features that need to be encoded. It can become hard to keep track of all of these features, especially if you want to do things like storing/saving features in a database or visualize feature importances (and don't want an extremely cluttered graph).

An alternative: Cyclical encoding

Time series features are cyclical by nature. When the clock strikes 24:00 (12AM), a new day begins and the next hour is 1:00 (1AM). Although the numbers 1 and 24 are actually the furthest apart number wise, 1 is as close to 24 as 23 is because they are in a cycle.

So, another way you can represent time series features numerically is by transforming the timestamps into sine and cosine transforms. This will essentially tell you the time of day, time of week, or time of year.

Instead of transforming datetime values into categorical features (as we do with one-hot encoding), we are transforming them into numerical features where some values are closer together (eg 12AM and 1AM) and others are further apart (eg 12AM and 12PM). This kind of information gets lost when we one-hot encode.

Sine and cosine come from the unit circle and the idea is to map where a timestamp lies on this circle represented by sine and cosine coordinates. Think of the right side of the circle as the starting point (It is denoted by a 0 on the chart below) or 00:00 (12AM) on a real 24 hour timescale, which we will divide into 4 6 hour landmarks in order to be able to map hours onto the circle.

As you move counterclockwise up the unit circle, it increases to pi/2 (or 90 degrees) which would be the equivalent of 6:00AM, pi (180 degrees) or 12:00PM, 3pi/2 or 6:00PM, and finally back to 0 at 12AM. Each time point in between any of those landmarks has its own unique coordinates. In this way we can represent the 24 hour daily cycle using sine and cosine.

The same can be done with other cycles, such as time of week or year.

To accomplish this in Python, you need to first convert the datetime (which in my case is hourly timestamps) into a numerical variable. By converting this column to a pd.Timestamp.timestamp object, we convert each timestamp into unix time (the number of seconds that have passed since January 1, 1970).

Now you can transform this numerical column into sine and cosine features.

# Convert datetime into a numerical seconds timestamp object 
# (tells you the date/time in seconds)
timestamp_s = df['Datetime'].map(pd.Timestamp.timestamp)

# Get the number of seconds for each time period
day = 24*60*60
week = day*7
year = day*(365.2425)

# Transform using sin and cos
# Time of day
df['Day_sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day_cos'] = np.cos(timestamp_s * (2 * np.pi / day))

# Time of week
df['Week_sin'] = np.sin(timestamp_s * (2 * np.pi / week))
df['Week_cos'] = np.cos(timestamp_s * (2 * np.pi / week))

# Time of year
df['Year_sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year_cos'] = np.cos(timestamp_s * (2 * np.pi / year))

Broadly, here is what is going on: First, we convert the timestamps from seconds to radians. The 2 np.pi part is because there are 2 pi radians in a full circle/cycle. The period you divide by after this conversion is the duration of the cycle in seconds (day, week, or year). Next, we map each timestamp to a unique angle that represents its position in the cycle through multiplying by the number of radians.

For example, if the period is day, then a timestamp at the start of the day would be mapped to 0 radians, a timestamp at the middle of the day would be mapped to np.pi radians, and a timestamp at the end of the day would be mapped to 2 * np.pi radians.

Finally, we take the sin and cos of the resulting calculation to get the actual x and y coordinate values on the unit circle. These will always be between -1 and 1.

With this approach, each original time series feature (eg Hour of day, Day of week, Month of year) now maps onto only 2 new features (sin and cos of that original feature) as opposed to 24, 7, 12, etc.

The cons of this approach

It's important to be careful when using this method. Though it is very convenient and efficient, there are some drawbacks and considerations:

One-hot encoding could work better for datasets which have more consistently different values based on specific hours, months, etc. – for example a dataset where the usage overwhelmingly peaks at 12PM or during a certain month. In datasets where there is more of a range (12PM-2PM), a more fluid approach like cyclical encoding could be more accurate.
This type of encoding works well for deep learning/neural networks, but potentially not for tree splitting algorithms like Random Forest. The reason for this is because a single timestamp, which would normally represent 1 feature, gets split into 2 features, and tree based algorithms make splitting decisions one feature at a time. Therefore the model will process the 2 features separately when in reality they are a coordinate pair that correspond to 1 original feature.

However, this is not to say you can never use cyclical encoding for tree based algorithms. I have actually used this type of encoding in a Random Forest model and had good results. It will really depend on your dataset, so it's important to still run metrics on a cross validation and final hold out test set to be sure.

Additionally, it's important to compare one-hot encoding results to cyclical encoding results before you decide what you will use.