Media Mix Modeling: Technical Guideline to Avoid Pitfalls for Data Scientists

Author:Murphy | View: 20733 | Time: 2025-03-23 19:07:02

TLDR: Media Mix Modeling (MMM) is a statistical approach for marketing measurement, but it's not a "One size fits all" solution like other techniques. I will cover essential checkpoints for data scientists to enhance their modeling techniques and achieve improved results, differences between advertising reports and MMM, as well as the distinctions between Multi-Touch Attribution (MTA) and MMM.

Introduction

MMM is a statistical method to understand & measure Return on Investment (ROI) and optimize their media budget.

Data scientists at companies with huge advertising budgets in the beverage, consumer goods, auto, and fashion industries have been working on improving MMM. Also, ad tech companies, such as Google and Meta, have been focusing on MMM actively these days since privacy regulations such as GDPR and Apple's IDFA deprecation have been impacting the tracking accuracy of the digital world.

For more information, please refer to this article I posted previously.

Media Mix Modeling: How to measure the effectiveness of advertising with Python & LightweightMMM

After publishing the above article, I received numerous positive feedback and some questions from Towards Data Science readers. Some common questions included:

(1) What are the checkpoints for a successful MMM project? (2) How do advertising reports and MMM differ? (3) What are the differences between Multi-Touch Attribution(MTA) and MMM?

In this article, I will address these questions and provide insights to help you better understand these concepts.

(1) What are the checkpoints for a successful MMM project?

To ensure the reliability and accuracy of MMM, data scientists should consider the following checkpoints before trusting the results: Some of the ideas are inspired by Google's research paper: Chan, D., & Perry, M. (2017). Challenges and Opportunities in Media Mix Modeling.

✔️ Sufficient data points:

Ensure that there are enough data points available to model the ad channels and their relationships.
Typically, MMMs often require a minimum of two years of weekly-level data. However, if you don't have that much data, daily data is also acceptable, but in that case, you will need to be more careful in reviewing the outliers.

✔️ Select appropriate input variables:

Variability in input data is crucial. For example, if spending on media channel X remains unchanged during the entire time period, the model may struggle to determine its influence on sales. In such instances, it could be more beneficial to exclude that channelvfrom the input variables.
Keep factors that affect sales. For instance, if the expenditure on magazine X is significantly smaller compared to other ads and only temporary, it might be more appropriate to exclude it.
Input variables can be selected during both the pre-modeling stage and the model selection stage.

✔️ Address correlated input variables:

Check for potential multicollinearity between input variables, which can lead to high-variance coefficient estimates and poor attribution of sales to specific ad channels.
If you constantly allocate the same budget for the Meta and Tiktok ads, the model's reliability may be compromised, making it difficult to assess the impact of these media channels.
Use methods to address correlated variables, such as regularization techniques or variable selection methods.

✔️ Control for selection bias:

Be mindful of potential selection bias issues, such as seasonality.
For instance, increased sales in November may not be a result of advertising but rather due to consumer demand or Black Friday sales.
In my experience, internal promotional data or pricing data can serve as valuable control factors for unobservable demand variables.

✔️ Validate extrapolation assumptions:

Extrapolation assumptions involve making predictions or drawing conclusions about a model's behavior beyond the range of available data.
Please be careful when trying to answer business questions such as "What if I multiply my ad spend to X by tenfold" or "What happens if I suddenly stop spending on ads X."
Test the model's performance on different scenarios, and be cautious when interpreting results that require significant extrapolation.

✔️ Test model performance:

Validate the MMM by comparing its predictions to actual sales and Marketing data.
R-squared and MAPE, mean absolute percentage error, are well-known measures for MMM. Generally speaking, R2 is considered good if it is more than 0.8. Also, for MAPE, the goal is for it to be 20% or below.

✔️ Conducting randomized experiments

When using MMM to estimate the impact of changes in the media budget, it's advisable to conduct randomized experiments to validate the findings.
For instance, by dividing regions into test and control groups and adjusting ad spend, you can measure the uplift and obtain more accurate results.

By addressing these checkpoints, data scientists can increase their confidence in the validity and reliability of MMM results and make more informed decisions based on the insights provided by the model.

(2) How do advertising reports and MMM differ?

Tags: Data Science Marketing Marketing Mix Modeling Multi Touch Attribution