Optimizing Your Strategies with Approaches Beyond A/B Testing
In the world of digital marketing, the competition is fierce. Every company wants to create a marketing strategy with the highest value – increasing customer retention, enhancing customer satisfaction, or archiving other business objectives. However, there is no perfect marketing strategy that fits all customers. Instead, we can strive to figure out an improved version of our marketing strategy. That's why A/B testing comes into play.
In a nutshell, A/B testing is an experiment that involves randomly splitting your audiences into two groups, and comparing two versions of a strategy to see which one performs better. It is the traditional and statistically proven method for making data-driven decisions.
When to use A/B Testing
Imagine you and your partner have been running a toy e-shop for a while. One day, your partner raises concerns about the background color of the landing page. He believes that the current color does not effectively drive the conversion rate of visitors (i.e. % of visitors purchasing items).
- Currently in use (Strategy A): Orange
- Your partner's suggestion (Strategy B): Yellow
You immediately identify that A/B testing is a good experimental design. Because this is relatively straightforward to determine the "winner" or "loser" based on the measurement results (i.e. the conversion rate of each strategy), showcasing which version of the website visitors find most engaging.
The following figure demonstrates the example situation.
What's the problem with A/B Testing
The result of A/B testing confirms that website visitors on the yellow landing page (strategy B) have a higher conversion rate in the e-shop. As a result, you decide to roll out the yellow landing page instead to the entire visitor population. However, you soon realize that this testing approach incurs a cost: With the strictly defined start and endpoint of the A/B testing, half of the traffic is allocated to the poorer strategy A for the entire period.
The cost difference between choosing the poorer strategy A and the most ideal option (strategy B in this case) can be represented in terms of regret.
This A/B testing can be summarised with the following characteristics:
- It emphasizes the value of exploration: gathering sufficient information to determine whether strategy A or B is optimal throughout the whole test.
- It overrides the value of exploitation: not striving to achieve a higher conversion rate by reducing traffic to strategy A during the test, even with the collected information that strategy B appears to perform better.
After hearing the result of the A/B testing, your partner gets excited about his success in suggesting a better strategy. He is interested to know further if other background colors, including green, red, and purple, can drive the conversion rate. You can already imagine that the scenario would get worse using the current testing method, as the regret of A/B testing will likely be even higher when more strategies are involved in the test.
Go beyond A/B Testing
We welcome new strategies, as they may bring us closer to our business goals. However, we usually doubt if we waste a significant amount of our limited time and resources on poor strategies during the test.
In classic A/B testing, we remain in the exploration mode, disregarding whether a strategy seems to be outperforming or underperforming. What if we took the opposite approach and solely focused on exploitation? Here's how it would work: The strategy that seems to be optimal is quickly chosen by our guess when only a few visitors make purchases in the e-shop, and then all visitors are directed to this strategy for the remaining time of the test. This approach carries a high level of risk since the sample size is likely not representative enough to draw the correct conclusions based on the performance of strategies.
Now, we understand neither sole exploration nor sole exploitation are good ideas. We need to balance between exploration and exploitation. Besides, it is unarguably that we should continuously allocate more traffic to the current optimal strategy during the test.
In other words, we prefer taking actions that maximize the estimated reward in the midst.
Epsilon-Greedy
The Epsilon-Greedy algorithm works with similar ideas by randomly selecting between exploration and exploitation for each visitor's traffic. Let's say we want to have 90% of the traffic focus on exploitation and 10% on exploration. The algorithm follows these steps:
When a visitor arrives at the e-shop,
- Identify the current optimal strategy
- There is a 90% chance of exploitation: direct the visitor to the current optimal strategy. Alternatively, there is a 10% chance of exploration: direct the visitor to one of the remaining strategies.
Over time, we greedily choose among different strategies for exploitation and gradually make a consistent decision of the currently optimal strategy. However, we don't discard the opportunity to experiment with other strategies, even coming close to the final stage of testing. By maximizing the estimated reward in each action, the overall cost of regret with this approach is likely to be smaller than that of classic A/B testing.
Thompson Sampling
Epsilon-Greedy seems promising, but can it be further improved? Let's consider two scenarios:
(A) When only a few visitors have arrived, can we confidently identify the optimal strategy for the entire population?
Probably not. We are concerned that these samples might be outliers. Since there is still high uncertainty, exploration remains necessary.
(B) When thousands of visitors have arrived or we are in the later stages of the test, do we have greater confidence now?
Most likely, because more samples provide higher statistical power, allowing us to identify the strategy that outperforms others in the real population.
Now, let's refine our approach. To choose between exploration and exploitation based on our confidence level, we can implement Thompson Sampling.
This approach selects a strategy with the highest probability of converting a visitor into a buyer, rather than focusing solely on maximizing expected rewards. Each time a visitor comes to the e-shop, this approach tracks the total number of buyers and non-buyers for each strategy and constructs their respective beta distributions. The advantages are:
- When the sample size is small, even a strategy with a lower conversion rate may be selected (prioritizing exploration).
- As evidence becomes more apparent, the strategy with a higher conversion rate will be chosen most of the time (prioritizing exploitation).
Contextual Bandits
Thompson Sampling is an effective approach as it dynamically adjusts the balance between exploration and exploitation throughout the test. While it allows us to make informed decisions about which strategy to apply when a customer arrives, these decisions are not personalized for different customer cohorts.
Let's consider two visitors: Peter and Mary. Peter enjoys watching Minions videos with his child every weekend, while Mary has no interest in them. Even in the early stages of the test, we can predict that Peter would prefer visiting the toy e-shop page in yellow (the main color of Minions!) and have a higher likelihood of making a purchase.
This example highlights the importance of considering the visitor's context data. By utilizing Contextual Bandits, we can apply algorithmic decisions more systematically. In a real-life scenario, the context may include historical data about each customer, such as website clicks, past purchases, frequency of opening personalized emails, and even data from their ongoing session, such as recent search queries. The algorithm can then learn to associate different contexts with the strategies that are most likely to lead to conversions for each visitor.
Any More Novel Ideas
We have gone through various approaches, including classic A/B testing, the Epsilon-Greedy algorithm, Thompson Sampling, and Contextual Bandits.
- Classic A/B testing: Involve remaining in the exploration mode.
- Epsilon-Greedy algorithm: Randomly balances between exploration and exploitation.
- Thompson Sampling: Place more emphasis on exploitation as the sample size for each strategy increases
- Contextual Bandits: Provide personalized and optimal strategies based on the context (side information) of the visitors
These approaches have been further refined and discussed in recent research papers. Here are a few examples:
- "Neural Contextual Bandits with Deep Representation and Shallow Exploration": Neural Contextual Bandits rely on the exploration which takes place over the entire network parameter space, this is typically inefficient for large-size networks. The paper suggests a new algorithm with shallow exploration and setting of a regret bound.
- "An Empirical Evaluation of Federated Contextual Bandit Algorithms": This paper explores the combination of federated learning, a decentralized approach to training Machine Learning models, with contextual bandits. It addresses concerns related to incorporating federated learning, such as utilizing a small amount of pre-training data.
Before you go
If you enjoy this reading, I invite you to **** follow my Medium page. By doing so, you can stay updated with exciting content related to data science, project management, and self-improvement.