Time Series for Climate Change: Forecasting Wind Power
Towards Clean Energy Production
Non-renewable energy sources impose a heavy ecological footprint on our planet. This issue prompted scientific and technological advances for clean energy sources. Instances include solar, wind, and ocean wave energy. These are friendly to the environment, unlike coal or oil.
One of the reasons that delay the widespread of clean energy sources is their irregularity. They are highly variable sources, which makes their behavior difficult to predict.
So, forecasting the conditions of these sources is a key challenge. Accurate predictions are essential for the efficient production of clean energy.
In this article, we'll develop a model to forecast wind power.
Wind Power
Wind power is an increasingly established source of renewable energy. As of 2020, wind power accounted for about 47% of Denmark's electricity generation. Other countries have increased their share of wind power in the electricity grid as well.
Wind power also has some disadvantages. For example, the visual impact and noise of wind turbines. In addition, wind power infrastructure needs a sizable initial investment.
The integration of wind power into the electricity grid is also difficult. Wind power can only be generated if the wind is blowing. This makes it an intermittent and unpredictable energy source. So, it needs to be coupled with other alternatives.
Wind ramps
Wind ramps are also a major concern for power system operators. These are large changes in wind power in short periods of time (minutes to hours). If undetected timely, wind ramps can compromise the reliability of the electricity grid.
Wind ramps can be either upward or downward changes. When a sudden drop in power occurs, energy from other sources must be raised to compensate for the loss. Sudden upward changes can prompt operators to decrease the output from other sources. Or, instead, sell the surplus energy.
The role of forecasting
Power system operators rely on forecasting models to predict wind conditions. These models enable operators to balance and integrate several energy sources efficiently. Accurate forecasts are important for the efficiency of the electricity grid, and also to reduce costs.
Hands-On
In the rest of this article, we'll build a model to forecast wind power. The goal is to show why this problem is challenging and how future developments could bring value.
You can find the complete code for this project on Github:
Data set
In this tutorial, we'll use a public dataset about wind power generation from a Belgium wind farm.
The time series is collected in 15-minute intervals from 2014 to 2018. Besides wind power, we also have information about the installed capacity (the largest power that could be generated):

Installed capacity increases over time as new wind turbines are added to the farm. So, we normalize wind power by the farm's capacity. This leads to a measure of wind power as a percentage of the total capacity.

Building a Forecasting Model
We'll build a forecasting model using machine learning algorithms. The idea is to apply a modeling technique called auto-regression. Auto-regression involves using recent past observations (lags) to predict future ones. You can read more about auto-regression in a previous article.
First, we need to transform the time series into a tabular format. This can be done with a method based on a sliding window called time delay embedding:
from sklearn.model_selection import train_test_split
# src module here: https://github.com/vcerqueira/tsa4climate/tree/main/src
from src.tde import time_delay_embedding
# number of lags and forecasting horizon
N_LAGS, HORIZON = 24, 24
# leaving last 20% of observations for testing
train, test = train_test_split(series, test_size=0.2, shuffle=False)
# transforming time series into a tabular format for supervised learning
X_train, Y_train = time_delay_embedding(train, n_lags=N_LAGS, horizon=HORIZON, return_Xy=True)
X_test, Y_test = time_delay_embedding(test, n_lags=N_LAGS, horizon=HORIZON, return_Xy=True)
We set both the number of lags and the forecasting horizon to 24. At each time step, we want a model to predict the wave's height in the next 24 hours using the past 24 hours as an input.
Here's a sample of the training explanatory and target variables:

Next, we use the training set to select a model. In this tutorial, we do a random search to select and optimize a regression algorithm. Besides, we also test whether feature extraction should be included. The particular feature extraction process is based on lag summary statistics.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit
from src.model_selection import (MetaEstimator,
search_space_with_feature_ext)
# Create a pipeline for hyperparameter optimization
# 'feature' contains different possibilities for feature extraction
# 'model' contains different regression algorithms and respective hyperparameters
pipeline = Pipeline([('feature', FunctionTransformer()),
('model', MetaEstimator())])
# do random search optimization for model selection
search_mod = RandomizedSearchCV(estimator=pipeline,
param_distributions=search_space_with_feature_ext,
scoring='r2',
n_iter=30,
n_jobs=1,
refit=True,
verbose=2,
cv=TimeSeriesSplit(n_splits=3),
random_state=123)
search_mod.fit(X_train, Y_train)
print(search_mod.best_estimator_)
# Pipeline(steps=[('feature', FunctionTransformer()),
# ('model', RidgeCV(alphas=0.25))])
The selected model is a Ridge regression without feature extraction. We re-train this model using the complete training set and evaluate it on the test set.
from sklearn.metrics import r2_score
# forecasting testing observations using the selected model
Y_hat_test = search_mod.predict(X_test)
Y_hat_test = pd.DataFrame(Y_hat_test, columns=Y_train.columns)
# evaluating the selected model over the forecasting horizon
r2_scores = {col: r2_score(y_true=Y_test[col], y_pred=Y_hat_test[col])
for col in Y_hat_test}
The results uncover two main challenges:
- predicting over large horizons;
- predicting extreme observations.
Increasing unpredictability across the horizon
It's well known that long-term forecasting (i.e. more than a few time steps ) is more difficult than short-term forecasting. Forecasting wind power also follows that tendency:

Predicting the wave power in the next hour is easy. The R2 score is about 0.98 – almost perfect. Yet, performance decreases a lot as we make forecasts for larger horizons.
Predicting for the long term (in this case, more than a few hours) is important to balance the energy demand and supply efficiently.
Predicting extreme values

We also need the model to be accurate in predicting extreme values. In this case, these represent high or low wind power. Such values are important to anticipate because they can have an impact on grid operations.
The above figure shows a scatter plot with predicted and actual values. The red dashed line is the ideal case where the predicted value matches the observed one.
For the most part, the data points revolve around the red line. But, for extreme observations, the points deviate from the line. So, the model sometimes fails to anticipate extreme values.
Overcoming these two limitations is important to integrate wind power into the electricity grid.
Key Takeaways
- Wind power is an increasingly popular source of renewable energy;
- Forecasting wind conditions is a key task for estimating the energy produced from this source;
- This forecasting task can be tackled with models based on auto-regression.
- Forecasting wind power is challenging for two reasons: low long-term forecasting performance, and low performance in extreme observations.
Thank you for reading. This article is the first installment of a series of posts about time series for Climate Change. Stay tuned for more!
References
[1] Wind power generation in Belgium Wind Farms (License CC BY 4.0)
[2] Rolnick, David, et al. "Tackling climate change with machine learning." ACM Computing Surveys (CSUR) 55.2 (2022): 1–96.