4 Ways to Quantify Fat Tails with Python

Author:Murphy | View: 20678 | Time: 2025-03-22 23:46:43

This is the third article in a series on Power Laws and Fat Tails. In the previous post, we explored how to detect power laws from empirical data. While this technique can be handy, Fat Tails go beyond simply fitting data to a power law distribution. In this article, I will break down 4 ways we can quantify fat tails and share example Python code analyzing real-world data.

Note: If you are unfamiliar with terms like Power Law distribution or Fat Tail, review this article as a primer.

In the first article of this series, we introduced the idea of fat tails, which describes the degree to which rare events drive the aggregate statistics of a distribution. We saw an extreme example of fat tails via the Pareto distribution where, for example, 80% of sales are generated by 20% of customers (and 50% of sales are generated by just 1% of customers).

Pareto, Power Laws, and Fat Tails

Although Pareto (and more generally Power Law) distributions give us a salient example of fat tails, this is a more general notion that lives on a spectrum ranging from thin-tailed (i.e. a Gaussian) to very fat-tailed (i.e. Pareto 80–20).

The spectrum of Fat-tailedness. Image by author.

This view of fat-tailedness provides us with a more flexible and precise way of categorizing data than simply labeling it as a Power Law (or not). However, this begs the question: how do we define fat-tailedness?

4 Ways to Quantify Fat Tails

While there is no "true" measure of fat-tailedness, there are a few heuristics (i.e. rules of thumb) we can use in practice to quantify how fat-tailed data are. Here, we review 4 such heuristics. We start by introducing each technique conceptually and then dive into example Python code.

Heuristic 1: Power Law Tail Index

The fattest fat tails appear in Power Law distributions, where the smaller a Power Law's tail index (i.e. α), the fatter its tail, as illustrated in the image below.

Example Power Law distributions with various α values. Image by author.

This observation that smaller tail indexes imply fatter tails naturally motivates us to use α to quantify fat tails. In practice, this boils down to fitting a power law distribution to a given dataset and extracting the estimated α value.

While this is a straightforward approach, it has one obvious limitation. Namely, the approach will break down when working with data that poorly fits a power law.

Heuristic 2: Kurtosis (i.e. non-Gaussianity)

The opposite of a fat tail is a thin tail (i.e. rare events are so rare they are negligible). A thin-tailed exemplar is the beloved Gaussian distribution, where the probability of an event 6 sigma away from the mean is about 1 in a billion.

This inspires another measure of fat tails by quantifying how "UN-Gaussian" the data are. We can do this via so-called non-Gaussianity measures. While we could devise many such measures, the most popular is Kurtosis, defined by the expression below.

Definition of Kurtosis according to ref [1] and [2]. Image by author.

Kurtosis is driven by values far from the center (i.e. the tails). Thus, the larger the kurtosis, the fatter the tail.

This measure tends to work well when all the moments are finite [3]. One major limitation, however, is that Kurtosis is not defined for some distributions, e.g. Pareto with α =< 4, which makes it useless for many fat-tailed data.

Heuristic 3: Log-normal's σ

In past articles of this series, we discussed the Log-normal distribution, defined by the probability density function below.

The probability density function of log-normal distribution [4]. Image by author.

As we saw before, this distribution is a bit mischievous because it can appear Gaussian-like for low σ, yet Pareto-like at high σ. This naturally provides another way to quantify fat tails, where the larger the σ, the fatter the tail.

We can obtain this measure in a similar way as Heuristic 1. Namely, we fit a log-normal distribution to our data and extract the fit's σ value. While this is a simple procedure, it (like Heuristic 1) breaks down when the log-normal fit does not explain the underlying data well.

Heuristic 4: Taleb's κ

The preceding heuristics (H) started with a particular distribution in mind (i.e. H1 – Power Law and H3 – Log-normal). In practice, however, our data rarely precisely follow any particular distribution.

Moreover, comparisons using these measures may be problematic when evaluating 2 variables following qualitatively different distributions. For instance, using a power law's tail index to compare Pareto-like and Gaussian-like data may have little significance since a power law will poorly fit into Gaussian-like data.

This motivates the use of non-distribution-specific measures of fat-tailedness. One such measure was proposed by Taleb in ref [3]. The proposed metric (κ) is defined for unimodal data with finite mean and takes values between 0 and 1, where 0 indicates data are maximally thin-tailed and 1 implies the data are maximally fat-tailed. It is defined according to the expression below.

Definition of Taleb's κ metric [3]. Image by author.

The metric compares two samples (say, Sₙ₀ and Sₙ) where Sₙ is the sum of n samples drawn from a particular distribution. For example, if we evaluate a Gaussian distribution and choose n=100, we would draw 100 samples from a Gaussian and sum them all together to create S₁₀₀.

M(n) in the above expression denotes the mean absolute deviation, defined according to the equation below. This measure of the dispersion around the mean tends to be more robust than the standard deviation [3][5].

Definition of mean absolute deviation from κ equation [3]. Image by author.

To simplify things, we can choose n₀=1, giving us the expression below.

The key term here is M(n)/M(1), where M(n) quantifies the dispersion around the mean for the sum of n samples (of some distribution).

For thin-tailed distributions, M(30) will be relatively close to M(1) since the data generally sit close to the mean. Thus, M(30)/M(1) ~ 1.

For fat-tailed data, however, M(30) will be much larger than M(1). Thus, M(30)/M(1) >> 1. This is illustrated below, where the left plot shows how dispersion scales for a sum of Gaussians, and the right plot shows how it scales for a Pareto.

*Scaling of M(n) and M(1) for Gaussian (left) and Pareto 80–20 (right) distributions. Notice the y-axis labels.* Note: the scale of Gaussian dispersion increases due to the summing of n distributions. *Image by author.*

Thus, for fat-tailed data, the denominator in the κ equation will be bigger than the numerator, making the second term on the RHS smaller and, ultimately, κ larger.

If this was all more math than you bargained for, here's the takeaway: Big κ = fat-tailed, small κ = thin-tailed.

Example Code: Quantifying the Fat-tailedness of (Real-world) Social Media Data

With the conceptual stuff out of the way, let's see what using these heuristics looks like in practice. Here, we will use each approach described above to analyze the same data from the previous article of this series.

The data are from my social media accounts, which include monthly followers gained on Medium, earnings per YouTube video, and daily impressions on LinkedIn. The data and code are freely available at the GitHub repo.

YouTube-Blog/power-laws/3-quantifying-fat-tails at main · ShawhinT/YouTube-Blog

We start by importing some helpful libraries.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import powerlaw
from scipy.stats import kurtosis

Next, we will load each dataset and store them in a dictionary.

filename_list = ['medium-followers', 'YT-earnings', 'LI-impressions']

df_dict = {}

for filename in filename_list:
    df = pd.read_csv('data/'+filename+'.csv')
    df = df.set_index(df.columns[0]) # set index
    df_dict[filename] = df

At this point, looking at the data is always a good idea. We can do that by plotting histograms and printing the top 5 records for each dataset.

for filename in filename_list:
    df = df_dict[filename]

    # plot histograms (function bleow is defined in notebook on GitHub)
    plot_histograms(df.iloc[:,0][df.iloc[:,0]>0], filename, filename.split('-')[1])
    plt.savefig("images/"+filename+"_histograms.png")

    # print top 5 records
    print("Top 5 Records by Percentage")
    print((df.iloc[:,0]/df.iloc[:,0].sum()).sort_values(ascending=False)[:5])
    print("")