A Story of Long Tails: Why Uncertainty in Marketing Mix Modelling is Important

Author:Murphy  |  View: 27327  |  Time: 2025-03-22 19:30:26

"Details matter. It's worth waiting to get it right." — Steve Jobs

Image generated by author using DALL-E

What if the most valuable insights from your Marketing Mix Model (MMM) are hiding in what we usually consider uncertainty or noise?

What would happen if we could look at the results of the MMM models that we are using today from a different angle?

Those of us who have made MMM models using Bayesian hierarchical models have seen that these models¹ provide lots of information about each of the parameters we set up in the model. By applying rigorous and widely validated statistical techniques, we choose, for example, the mean (sometimes the median) of the posterior distribution as the value of the influence for a certain channel. Then, we consider and generate actionable insights from this value. However, the truth is that Bayesian analysis gives us as output a probability distribution of values, and the tails are frequently large with rare occurrences and exceptions. If we underestimate the information contented in these tails, we are losing a valuable opportunity. In the expression of those long tails, if we look with the proper lens, we can find very valuable insights. Actually, the basic idea for which most users use MMM models is to quantify the influence of each channel in, for example, monthly sales or the number of units sold. However, that is just the tip of the iceberg. These models have much more to say.

** This post strives to explain where these exceptions forming the long tails come from and what they might mean using Complex Systems theor**y formalisms as the lens to look at MMM models.

… but still there were few things to be said.'

Death in the Afternoon, Ernest Hemingway

Uncertainty in Marketing Mix Modelling: the Gaussian average

One of the most important parameters to estimate in a MMM is the channel influence or effect in a given KPI (our dependent variable). This parameter is pivotal for many advertising strategies. Most quantitative MMM analyses suppose Gaussian (normal) distributions with finite means and variances [4]. This is especially important when we work with Bayesian methods for MMM [23]. In these methods, we try to infer the probability distribution (posterior) of these parameters (channel influence, lag, carryover, etc.) from the observed data using a preset "guiding" distribution (prior). Then we treat the inferred distribution according to Gaussian rules, and we choose the mean of this distribution as the statistically significant value for each of these parameters.

Figure 1. Ideal Gaussian distribution and long tails. Illustration by author.

Nevertheless, reality is stubborn and very often (most of the time) shows rare occurrences and exceptions, what in distribution curves is known as long tails [3].

These long tails (also known as heavy-tailed distributions) are not there by error; they have an important meaning.

Power laws and complex systems

The empirical investigation of complex systems in the real world has shown a hidden pattern that holds true in a wide range of situations [18]. Power laws or scaling laws can be seen as laws of nature describing complex systems [12][30]. We call them scaling laws because they maintain their proportions regardless of scale. A scaling law is a functional relationship that relies on a polynomial with a scaling parameter α and a constant C.

Equation 1: Scaling law function.

Therefore, y always changes as a power of x, and a relative change in x leads to a proportional relative change in y despite their initial values. This law is widespread in nature in a wide range of phenomena, like the size of earthquakes, the number of heartbeats of mammals in relation to their weight, or the empirical wealth distribution in the US. And yes, it is also shown in the performance of Marketing channels.

Figure 2. Scaling law relation (top). Scaling law in mammals metabolic rate (bottom). Plot by author. Data from "Painter PR. Allometric scaling of the maximum metabolic rate of mammals: oxygen transport from the lungs to the heart is a limiting step. Theor Biol Med Model. 2005 Aug 11;2:31. doi: 10.1186/1742–4682–2–31. PMID: 16095539; PMCID: PMC1236962." used under the terms of the Creative Commons Attribution License.

Let's consider a digital campaign in a given channel: a few posts get massive engagement (the head). Many posts get moderate engagement (the body). But most posts get minimal engagement (the tail). This behavior follows a scaling law, and this pattern is natural and can be expected. It will contain valuable information and will repeat at different scales. In the conventional MMM approach, we try to reduce variance and focus on averages. But we could decide to recognize that this variance follows predictable patterns and use this information to our benefit.

When we observe channel influence in MMM, we calculate its influence using the mean μ, representing the average effect we are going to consider for future planning decisions. The standard deviation, σ, represents the uncertainty, and the ratio σ/μ often increases with μ. This suggests a scaling law where larger effects have proportionally larger variations. The pattern repeats at different advertising investment levels. We call it scale-free behavior.

Instead of considering high uncertainty as poor modelling capabilities or bad data quality and focusing on mean effects, we can consider this as a natural system property.

Self-organized criticality systems

Figure 3. Avalanche dynamics in sand piles can be explained with self-organized critically theory (SOC). Illustration by author.

In statistical mechanics, the theory of self-organized criticality (SOC) unifies power-law behavior observed in complex systems [22][27][28]. The basic idea behind SOC is that a complex system will naturally organize itself into a state that is on a critical point that is the edge of two different regimes, without intervention from outside the system [33]. This is known as a phase transition state because a system moves from one regime to another once it has reached a critical point. In the sand pile example in Figure 3, the pile will grow until a certain height (critical point). Beyond this point, the pile is not growing anymore, and the sand will start to roll down, starting a different avanlanche dynamics. This illustrates the phase transition concept.

Highly optimized tolerance mechanism

This natural organization occurs in complex systems consisting of many interacting components, as in the case of the sand pile. The mechanism is known as highly optimized tolerance (HOT). Most complex systems consist of many heterogeneous components, which often have a complex structure and behavior of their own. The interaction of these components forms a large and more robust system with a specific behavior [8]. It's like showing a global dynamics that is the result of the interaction of multiple different dynamics. This system's dynamics represent the optimal one that captures all the dynamics that build the system.

Figure 4. Parts of a Eukariotic cell. The operation of a cell is a clear example of highly optimized tolerance – HOT mechanism. The cell has different components that have a high complexity, for example, the core. Every part of the cell has its own specialization. This diversity of functions makes the cell more robust. Parts from https://en.wikipedia.org/wiki/Eukaryote. Illustration by author.

In media advertising, multiple factors or mechanisms, such as different audiences, contexts, touchpoints, or seasonality effects, contribute to the wide range of responses or long tails observed in advertising channel effectiveness modelling. All these effects have complex dynamics of their own, like, for example, the complex social influence effect on audiences or the network effect. This mechanism's diversity makes the channel influence (global optimal mechanism) more robust, just like the different cell's components perform different specialized functions in Figure 4. This behavior is reflected in the long tails of the probability distribution of a channel influence. This influence value clusters around an average, but we also get "specialized" mechanisms represented by points distant to the main corpus of values (the long tails). These unusual responses or exceptions are the channel's mechanism key for adapting the system (channel influence) to different situations in an optimal way. In other words, without these long tails, the channel influence could not be optimal.

Figure 5. Examples of models in psychology that can be used to model consumer behavior [1][2][6][16][25]. This model can also be understood with a _Highly Optimized Tolerance (_HOT) mechanism, in which the complete mechanism can be divided into several sub-mechanisms of high complexity. Illustration by author.

System properties

The most important takeaway is that channel influence long tails don't just happen when similar effects interact and add up. This is called self-similarity, and it is a property of systems that have similar structures at different scales [7]. When different mechanisms (let's call them sub-systems) interact with each other, we call it self-dissimilarity. The concepts of self-similarity and self-dissimilarity are important for advertising. When our advertising campaign as a system shows self-similarity (in the inner mechanisms), it could be understood as a very effective system. The contrary could mean that we have very different mechanisms that perform with less effectiveness, but its diversity can offer several other advantages.

** What is important to take into account is that high uncertainty in channel influence isn't a problem with the measurement or a statistical realit**y; it's an indicator of a robust, optimal complex system made up of diverse mechanisms that work together.

Percolation

An important concept from statistical mechanics is the percolation mechanism [19]. Think about how water flows through different types of surfaces. In a random, porous material like a sponge, water spreads pretty evenly in all directions. But we can also design an irrigation system for a garden to bring water accurately where it is needed. This will not be a random system because main channels and smaller branches have been specifically designed to reach specific points while being efficient with water. This design works as a HOT system. Marketing channels work similarly. In some cases, like a viral social media campaign, information might spread somewhat randomly through the network. But when we accurately design a marketing campaign, the spread of influence follows optimized paths, like in the irrigation system. This is where the parameter

Tags: Bayesian Statistics Complex Systems Consumer Behavior Marketing Marketing Attribution

Comment