AutoGluon-TimeSeries: Every Time Series Forecasting Model In One Library

Author:Murphy  |  View: 25982  |  Time: 2025-03-22 23:26:40
Image created by author using Stable Diffusion

The open-source landscape for time series is heating up.

This includes successful libraries such as Darts, GluonTS, and Nixtla.

Last year, Amazon built an extension of its AutoGluon library, which focuses on time-series – known as AutoGluon-Timeseries(AG-TS)[1].

AG-TS leverages the expertise of other libraries:

  • From Amazon itself (AutoGluon and GluonTS).
  • From Nixtla (StatsForecast and MLForecast).

And the best part: AG-TS has a user-friendly API— we can get predictions with only a few lines of code!

This article explores AG-TS and outlines its capabilities. We will also construct a simple project, utilizing the widely known Tourism dataset[2].

Let's dive in

I've launched AI Horizon Forecast, a newsletter focusing on time-series and innovative AI research. Subscribe here to broaden your horizons!

What is AutoGluon-Timeseries

AutoGluon–TimeSeries is an AutoML time-series framework, focusing on probabilistic forecasting and leveraging ensembling.

AG–TS supports:

  • Open-Source: The code is available on GitHub – the library is part of Amazon's Autogluon suite.
  • User-Friendly API: Load your data, and call the fit() and predict() methods.
  • Model variety: Access to SOTA forecasting models across various categories, including statistical models, tree-based approaches, and deep-learning models.
  • Powerful Ensembles: AG–TS incorporates automatic configurations for optimal ensembling, a key strategy in forecasting.
  • Probabilistic output: The users can generate point forecasts, with optional prediction intervals.
  • Superior Performance: AG–TS excels in an evaluation of 29 benchmark datasets, outperforming many forecasting methods.

Models of AG–TS

AG–TS is dedicated to using ensembling to achieve highly accurate predictions.

The library uses numerous state-of-the-art models to construct the ensemble. These models fall into 2 categories:

  • Local models: In this group, we train one model per time series – here we have mostly statistical models.
  • Global models: Those are fitted on every time series of the dataset, benefiting from cross-learning. This category includes powerful Deep Learning forecasting models like Temporal Fusion Transformer and DeepAR. Also, this category includes Tree-based models, such as LightGBM.

The complete list of all models can be found on the AG–TS GitHub page:

As you can see, AG-TS contains:

  • Traditional statistical models (ARIMA, ETS etc.)
  • And their improved versions (AutoARIMA, AutoETS)
  • Newer statistical models (AutoCES, DynamicOptimizedTheta)
  • Boosted Trees (Direct Tabular, Recursive Tabular – these are XGBoost etc)
  • Modern Deep Learning Models: Temporal Fusion Transformer, and the newest breakthrough, PatchTST
  • Specialized Models: For instance, the Croston models which are suitable for sparse data (intermittent forecasting)

Note: AG-TS is constantly updated with new models, so this list may change in the future

Automatic Presets

You can fit SOTA models and obtain predictions in less than 10 lines of Python code.

To achieve this, AG-TS provides 4 pre-defined presets that determine the quality of the forecasting model.

These presets determine the types of models included in each ensemble, effectively offering a trade-off between accuracy and training time. The available options are:

  • fast_training: Here, simple statistical models are used. such as Theta and ETS. Fast, but not very accurate.
  • medium_quality: All options above + DeepAR.
  • high_quality: All options above, + automatically tuned statistical models (e.g. AutoETS) + Tree-based models + Deep Learning models (Temporal Fusion Transformer, PatchTST).
  • best_quality: All options above + more training copies of DeepAR. Much more accurate, but takes longer to train.

In the next section, we will explore AG-TS by using the fast_training on a popular time-series dataset.

Load Data

I've made a notebook that runs the code from the article. You can find it here:

Get Project #1

Note: We'll see only a minimal example here. Project #1 contains more advanced cases, including deep-learning models, cross validation etc.

We'll use the Tourism dataset from Kaggle's Tourism forecasting competition, which can be directly loaded from GluonTS:

pip install gluonts
pip install autogluon
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import os
import subprocess

from gluonts.dataset.repository import get_dataset, dataset_names
from gluonts.dataset.util import to_pandas
from gluonts.evaluation.metrics import mse
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor

dataset = get_dataset("tourism_monthly")

Regarding the tourism-monthly dataset:

  • The dataset contains 366 time series.
  • It comes pre-split into train and test datasets.
  • The test dataset contains the same data as the original data.
  • In train data, the last prediction_length time steps are removed from the end of each time series.
  • Our goal is to predict the next 2 years = 24 months, so we consider prediction_length = 24 (also called forecasting horizon).

Regarding evaluation:

  • Testing evaluation will be conducted on the last prediction_length values of each test series. The final test score will be the average across all those time series.
  • Similarly, validation will be performed on the last prediction_length values of each train series (followed by averaging). We'll explore more sophisticated validation techniques later.
  • The evaluation metric we'll use is MSE.

Next, let's plot the first time series in the tourism-monthly dataset:

train_entry = next(iter(dataset.train))
test_entry = next(iter(dataset.test))

test_series = to_pandas(test_entry)
train_series = to_pandas(train_entry)

fig, ax = plt.subplots(2, 1, sharex=True, sharey=True, figsize=(10, 7))

train_series.plot(ax=ax[0])
ax[0].grid(which="both")
ax[0].legend(["train series"], loc="upper left")

test_series.plot(ax=ax[1])
ax[1].axvline(train_series.index[-1], color="r")
ax[1].grid(which="both")
ax[1].legend(["test series", "end of train series"], loc="upper left")

plt.show()
Figure 1: Visualizing the 1st time series of our dataset – top: training time series, bottom: testing time series

Note: In most time series libraries, we usually split data into train and test datasets, as 2 separate dataframes. In AT-GS, data wrangling is easier if you consider the original time-series as the test set and the original time-series minus the prediction_length(s) as the train set.

Preprocess Data

Before building our model, we must first convert our dataset to a format that AG-TS understands.

AG–TS expects the data in a "melted" format: <time series ID, timestamp, target>. The correct format should look like this:

Figure 2: The training dataset

Next, we convert the train and test pandas dataframes to Autogluon TimeSeriesDataFrame dataframes:

train_data = TimeSeriesDataFrame.from_data_frame(
    train,
    id_column="item_id",
    timestamp_column="start"
)

test_data = TimeSeriesDataFrame.from_data_frame(
    test,
    id_column="item_id",
    timestamp_column="start"
)

test_data.head()
Figure 3: _testdata is now a TimeSeriesDataFrame instance and can be used in Autogluon fit() and predict() functions

We are now ready to proceed to model training.

Model Training

AG-TS provides 4 powerful ensemble presets.

Let's explore the 1st preset, fast_training – which trains the following models: Naive, SeasonalNaive, Theta, ETS, RecursiveTabular, and their Ensemble.

Note: AG-TS constantly updates which models are used in each preset. For example, in the upcoming AG-TS 1.0 version, fast_training will also include DirectTabular.

The most important arguments are:

  • the path where the models will be saved,
  • the evaluation metric (here MSE),
  • the time_limit, which ** instructs AG-TS to train as many models as possible within a specified time frame. Here, we omit this argument – meaning AG-TS** will run until all models have completed training.
%%time
multiple_timeseries_path = "multiple-timeseries"
model_path_fast = "tourism-quarterly-fast"
path = os.path.join(multiple_timeseries_path, model_path_fast)

predictor = TimeSeriesPredictor(
    prediction_length=24,
    #path = "multiple_timeseries_path/tourism-quarterly-fast"
    path= path,
    target="target",
    eval_metric="MSE"
)

predictor.fit(
    train_data,
    presets="fast_training",
    random_seed=42
)

The output is quite comprehensive.

AG-TS provides extra details about each model – including how long they took to train and which performed best. In our case, the WeightedEnsemble has the best validation score, with a total runtime of approximately 2:30 mins.

Note: AG-TS multiplies all scores by -1, such that higher "negative losses" correspond to better scores.

Model Evaluation

Generating predictions is straightforward:

predictions = predictor.predict(train_data, random_seed=42)
predictions

We obtain predictions for the test set of every time series (_itemid).

But more importantly, since AG-TS focuses on probabilistic forecasting, we get 2 types of forecasts:

  • point forecasts (mean): the expected value of the time series at each time step in the forecast horizon.
  • quantile forecasts: the quantiles of the forecasting distribution – serving as prediction intervals. By default, AG-TS outputs forecasts for the quantiles [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

Subsequently, we calculate validation MSE and test MSE using AG-TS's predictor.evaluate() method. To validate these scores, we also create our predictions_df dataframe and compute the test MSE manually:

fast_valid_loss = predictor.evaluate(train_data)
fast_test_loss = predictor.evaluate(test_data)

predictions_df = pd.DataFrame({
    'item_id': test.groupby('item_id').tail(24)['item_id'].values,
    'predictions': predictions.reset_index()['mean'],
    'target': test.groupby('item_id').tail(24)['target'].values
})

predictions_df = predictions_df.groupby('item_id',sort=False).apply(lambda group: mse(group['target'], group['predictions'])).reset_index()
predictions_df.columns = ['item_id', 'MSE']

print(f"Validation loss: {fast_valid_loss},  Test loss: {fast_test_loss},  Test loss manual {predictions_df['MSE'].mean()}")

#Validation loss: -161640431.36874282,  Test loss: -54499463.986354746,  Test loss manual 54499463.986354746

As expected, our manual calculation of the test score matches that of AG-TS. Also, we obtained the same validation loss as the one displayed in the training details output (the score of WeightedEnsemble).

My favorite command is predictor.leaderboard() – because it clearly and concisely displays the scores and prediction times for both validation and test data:

predictor.leaderboard(test_data, silent=True)

How cool is that!

Also, take note of SeasonalNaive. Despite ranking as the 4th best on the validation set, it achieves the 2nd best score on the test set – remarkable considering it's just a baseline model!

Plotting Predictions

Finally, let's plot the first time series of our dataset, T1. The plot also illustrates the quantile levels P10 and P90:

Figure 4: Predictions of fast-training preset for Item T1 (the first time series of our dataset).

The predictions are highly accurate.

Additionally, observe the wider prediction intervals at the lower points of the predicted sequence – indicating the model's increased uncertainty in those areas.

What's Next

This concludes our fast-training preset benchmark.

Since we want to stay true to the article's purpose, we only provided a minimal example to showcase how AG-TS works.

If you want to delve deeper into more advanced cases, including:

  • using powerful DL models,
  • error analysis,
  • loading and saving models,
  • cross-validation,
  • backtesting,
  • hyperparameter tuning,
  • adding covariates/static/exogenous variables,
  • custom presets,

feel free to check Project #1 here. We will also explore these in a Part-2 article.

Remember, AG-TS has a user-friendly API, but using only the defaults has limitations. For our dataset, the basic statistical ensemble worked well.

Closing Remarks

AutoGluon–TimeSeries stands as an exceptional tool for the forecasting community, building upon Amazon's foundational work in forecasting literature.

In essence, AG–TS is a powerful and robust library capable of generating accurate forecasts with just a few lines of code.


Thank you for reading!

AutoGluon-TimeSeries : Creating Powerful Ensemble Forecasts – Complete Tutorial

References

[1] Shchur et al., AutoGluon–TimeSeries: AutoML for Probabilistic Time Series Forecasting [2023]

[2] Tourism Dataset https://www.kaggle.com/c/tourism1, Public domain

All images used in the article have been created by the author

Tags: Artificial Intelligence Data Science Deep Learning Finance Time Series Forecasting

Comment