An Introduction to Quantile Loss, a.k.a. the Pinball Loss

Author:Murphy | View: 25832 | Time: 2025-03-23 11:55:27

There are many articles on regression evaluation metrics, such as MSE, MAE, RMSE, etc. These metrics are very important when we care about the mean or median prediction. However, when we want to train our models to focus on other locations in the distribution, we have to use a different metric, which is not so frequently described in data science blog posts.

In this article, we will explore the quantile loss, also known as the pinball loss, which is the go-to metric in quantile regression.

A few definitions to get us started

Before explaining the quantile loss, let's quickly go through a few definitions to make sure we are on the same page.

Let's start with a simple one. Algorithms that belong to the regression type predict a continuous variable, for example, they predict the temperature, the price of a stock, the demand for the latest iPhone, etc.

Now it is a time for a refresher from statistics. An α quantile is a value that divides a given set of numbers such that α × 100% of the numbers are less than or equal to this value, while the remaining (1 − α) × 100% of the numbers are greater than or equal to this value.

To make it more concrete, the 50th quantile (median) splits the data so that half the values are below it, while the other half are above. Similarly, the 10th quantile indicates the point below which we can find 10% of the data And the 90th quantile marks the point below which we can expect to see 90% of our data.

Combining the two ideas from before, quantile regression is a type of regression analysis used to estimate the conditional quantiles of a target variable. Thus, it provides a more comprehensive view of the relationship between variables than the mean.

Let's look at an example. Assume we have a quantile regression model predicting the demand for apples tomorrow. Our model forecasts the 90th quantile as 100, which means that according to the model, there is a 90% Probability that the actual demand will be 100 or lower.

The intuition behind quantile loss

Before looking at the formula or a plot of the loss, let's first develop some intuition on how to evaluate a quantile (or probabilistic) forecast. To illustrate, consider the previous example:

If a 90th quantile forecast is 100, it indicates a 90% probability that the actual demand will be 100 or lower.

For such a forecast, we would expect the demand to be below 100 in 90% of cases. Given this kind of statement about probability, such a forecast should receive a higher penalty for underpredicting demand (or any other value) than for overpredicting it. This hints at the fact that quantile loss is an asymmetric loss function.

Furthermore, following this logic, the penalty for underestimating should increase with higher quantiles. So, the higher the quantile, the more the quantile loss penalizes underestimates and the less it penalizes overestimates.

Let's now think about the other extreme of the quantile range. The 10th quantile forecast of 100 indicates that 90% of the time, we would expect the actual value to be higher than 100. So, the quantile loss function for the 10th quantile should place a greater penalty on overestimating the true value compared to underestimating it. This would reflect the importance of accurately capturing lower values in the distribution.

The formula

Now that we have developed some intuition, let's look at the formula for quantile loss:

where α is the quantile, y is the actual value, y_hat is the predicted value, and (y – y_hat) is the prediction error. The first case (row) represents underpredicting, while the second corresponds to overpredicting.

Now to make it a bit more visual, let's inspect the plot showing the quantile loss for the 10th, 50th, and 90th quantiles. We can generate such a plot using the following snippet:

Python">import numpy as np
import matplotlib.pyplot as plt

# Define the pinball loss function
def pinball_loss(y_true, y_pred, quantile):
    return np.where(y_true >= y_pred, quantile * (y_true - y_pred), (quantile - 1) * (y_true - y_pred))

# Generate a range of prediction errors
errors = np.linspace(-10, 10, 400)  
y_true = 0  

# Quantiles
quantiles = [0.1, 0.5, 0.9]
line_styles = ['-', '--', '-.']  

# Plotting
plt.figure(figsize=(10, 6))
for q, ls in zip(quantiles, line_styles):
    losses = pinball_loss(y_true, errors, q)
    plt.plot(errors, losses, linestyle=ls, label=f'Quantile {q*100:.0f}')

plt.axhline(0, color='gray', linestyle='--', linewidth=0.5)
plt.axvline(0, color='gray', linestyle='--', linewidth=0.5)
plt.xlabel('Prediction Error (y_true - y_pred)')
plt.ylabel('Pinball Loss')
plt.title('Pinball Loss for Different Quantiles')
plt.legend()
plt.grid(True)
plt.show()

Inspecting the plot leads to the following conclusions:

The dashed orange line represents the median. As you can see, the line is symmetric around zero (perfect prediction). In other words, using the median assigns equal weights to underpredictions and overpredictions. This is also equivalent to using the MAE loss function.
Values of α < 0.5 make overprediction much more expensive. As such, the model is incentivized to underpredict the target.
Values of α > 0.5 make underprediction much more expensive. As such, the model is incentivized to overpredict the target.
The further α is from 0.5, the stronger the incentive for under- or overprediction.
The lower the quantile loss, the better the model is performing. As we have already mentioned, a loss of 0 represents a perfect score.

Sidenote: The name "pinball loss" comes from the shape of the loss function, similar to how a ball bounces around in a pinball machine.

Let's look at a concrete example to see how all of this comes together. Assume we are interested in the 90th quantile. The loss function will then look as follows:

So, for α = 0.9, underpredictions will be penalized by a factor of 0.9, while a factor of 0.1 will be applied to overpredictions. We can see that, in this case, underpredictions are penalized 9 times more severely than overpredictions. As a result, the regression model will be more concerned about underpredictions and it will tend to predict higher values more often.

On average, we can expect such a model to overpredict in approximately 90% of cases and underpredict in the remaining 10% of cases. And that is basically what a 90th quantile represents.

In the following plot, we can see a concrete example of the asymmetry of quantile loss. Assuming a true value of 100 and a forecast error of 5 (both over and under), underprediction is penalized 9 times more than overprediction.

Similar calculations can be easily repeated for the other extreme, such as the 10th quantile.

When to use

Now that we know what quantile/pinball loss is, let's consider its use cases. We know that it is used for evaluating quantile regression and probabilistic forecasts, but when would we want to use it in the first place? Below are a few examples:

Predicting quantiles – When we want to predict certain quantiles of a distribution instead of the mean. This can be useful in domains such as financial risk management or weather forecasting.
Accounting for asymmetry in losses – In some cases, the costs of overestimating and underestimating are not equal. For example, in supply chain management, underestimating demand might lead to lost sales. On the other hand, overestimating might lead to excess inventory. We can use the quantile loss to optimize for such asymmetric scenarios.
Predicting intervals – For example, in time series forecasting, predicting a range (for example, between the 10th and 90th quantile) can provide more useful information than a point forecast.

Wrapping up

In this article, we have covered quantile loss, what it is, and when to use it. The main takeaways are:

Quantile loss is an asymmetric, cost-sensitive loss function used to train models that predict specific quantiles of a target variable's distribution.
For the median, it is equivalent to MAE.
The further α is from 0.5, the stronger the incentive for under- or overprediction.

As always, any constructive feedback is more than welcome. You can reach out to me on LinkedIn, Twitter, or in the comments. You can find the code used in this article here.

Until next time

Tags: Deep Learning Machine Learning Probability Python Statistics