Influential Time-Series Forecasting Papers of 2023-2024: Part 1

Let's kick off 2025 with a roundup of notable time-series forecasting papers.
I've also included some key papers from 2023 as well. 2024 was the year of foundation forecasting models – but I'll cover those in a separate article.
The papers I'll cover are:
- Deep Learning-Based Forecasting: A Case Study From the Online Fashion Industry
- Forecasting Reconciliation: A Review
- TSMixer: An All-MLP Architecture for Time Series Forecasting
- CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
- Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Let's get started!
✅ I've launched AI Horizon Forecast, a newsletter focusing on time-series and innovative AI research. Subscribe here to broaden your horizons!
Deep Learning-Based Forecasting: A Case Study From the Online Fashion Industry
This paper is a gem.
It made headlines initially because Zalando, a leading online fashion retailer, used a custom Transformer model to outperform SOTA forecasting models – including LGBM.
But it's more than that – this paper offers depth and is authored by brilliant researchers in the time series/ML ecosystem.
Paper Insights Overview
The paper describes Zalando's in-depth retail forecasting pipeline and offers valuable insights:
- Challenges unique to online retail fashion forecasting.
- How to handle sparsity, cold starts, and short history (Figure 1).
- What external covariates were used and how the Transformer is built to leverage them?
- How elegant tricks, with interesting explanations, improved the vanilla Transformer and tailored it to time-series forecasting.
- Extensive details on the custom loss function, evaluation metrics, hyperparameters, and training configurations – rarely disclosed in many papers.

The major contributions/key points of this paper are:
- Covariate Handling: Using discount as a dynamic covariate.
- Causal Relationship: Enforcing a monotonic relationship between discount and demand via a piecewise linear function parameterized by feedforward networks.
- Two forecasting modes: Training and predicting short- and long-term demand with a single global model.
- Scaling laws: Demonstrating the first sparks of scaling laws in Transformer forecasting models.
First, the authors explain how they configured the forecasting problem by translating the sales observations into demand forecasting. Demand planning is the most common case in retail forecasting.
Also, the authors cleverly impute demand (e.g., it can happen when there's a stock shortage) instead of marking it as missing.
Note: Demand Forecasting is the first and most important step in a supply forecasting pipeline
Demand represents what customers want to buy, while sales are restricted by stock availability. Forecasting demand is crucial, as it reflects true customer needs, unlike sales, which are supply-dependent.
If sales drop to zero due to stock shortages, future forecasts based on sales will also reflect zero. This can mislead automated supply planning systems, leading to no new orders.
To ensure an efficient supply chain, always forecast demand instead of sales. Accurate demand forecasts prevent disruptions and maintain stock to meet customer needs.
Zalando's Forecasting Pipeline
The paper categorizes features into 4 types:
- static-global: time-independent & market-independent
- dynamic-global: time-dependent & market-independent
- dynamic-international: time-dependent & market-specific
- static-international: time-independent & market-specific

While all covariates enhance performance, discount is the most influential. Also, data has a weekly frequency.
The Zalando team trained a global Transformer model with:
- Input: A single tensor of dimension R