Methodology
How multi-armed bandits work in Statsig Autotune to automatically allocate traffic to the best- performing variant based on a single goal metric.
Model
The base Autotune implementation uses a Thompson Sampling (Bayesian) algorithm to estimate each variant's probability of being the best variant and allocate a proportional amount of traffic.
For example, if a given variant has a 60% probability of being the best, Autotune allocates 60% of the traffic to it. The multi-armed bandit algorithm adds more users to a treatment as soon as it determines that treatment is clearly better at maximizing the reward (the target metric).
Throughout the process, higher-performing treatments receive more traffic and underperforming treatments receive less. When the winning treatment beats the second-best treatment by a specified margin, the process ends.
Some helpful references:
- Statsig Blog
- Goyal and Agrawal (Microsoft Research) Regret Analysis
- Doordash Engineering Summary Blog
Advantages
The main advantage of the base Multi-Armed Bandit over a contextual bandit is its ability to converge and identify the best variant. When a single solution works well for all users, the Multi-Armed Bandit efficiently allocates traffic and determines the correct long-term solution while minimizing regret (the cost of exposing many users to a worse variant, as happens in an A/B test).
Disadvantages
The main disadvantage of a Multi-Armed Bandit compared to a contextual bandit is its inability to personalize. When user attributes interact with variants, Autotune can identify a global maximum that is worse than serving each user their individual best variant.
For example, even if the "US Flag" variant had the highest overall CTR, it would be a poor choice for Canadian users. In such cases, both groups converge to a sub-optimal variant.
| A/B/n Test | Multi-Armed Bandit (Autotune) | Contextual Bandit (Autotune AI) | Ranking Engine | |
|---|---|---|---|---|
| Typical # Variants | 2-3 | 4-8 | 4-8 | Arbitrary # |
| Personalization Factor | None | None | Moderate | High |
| Input Data Required | None | Very Little (100+ samples) | Little - generally 1000+ samples | Tens of thousands to millions of samples |
| Model Efficacy | None | Basic | Moderate | High |
| Identifies Best Variant | Yes | Yes | No | No |
| Consistent User Assignment | Yes | No | No | No |
Was this helpful?