On this page

Methodology

How multi-armed bandits work in Statsig Autotune to automatically allocate traffic to the best- performing variant based on a single goal metric.

Model

The base Autotune implementation uses a Thompson Sampling (Bayesian) algorithm to estimate each variant's probability of being the best variant and allocate a proportional amount of traffic.

For example, if a given variant has a 60% probability of being the best, Autotune allocates 60% of the traffic to it. The multi-armed bandit algorithm adds more users to a treatment as soon as it determines that treatment is clearly better at maximizing the reward (the target metric).

Throughout the process, higher-performing treatments receive more traffic and underperforming treatments receive less. When the winning treatment beats the second-best treatment by a specified margin, the process ends.

Some helpful references:

Advantages

The main advantage of the base Multi-Armed Bandit over a contextual bandit is its ability to converge and identify the best variant. When a single solution works well for all users, the Multi-Armed Bandit efficiently allocates traffic and determines the correct long-term solution while minimizing regret (the cost of exposing many users to a worse variant, as happens in an A/B test).

Disadvantages

The main disadvantage of a Multi-Armed Bandit compared to a contextual bandit is its inability to personalize. When user attributes interact with variants, Autotune can identify a global maximum that is worse than serving each user their individual best variant.

For example, even if the "US Flag" variant had the highest overall CTR, it would be a poor choice for Canadian users. In such cases, both groups converge to a sub-optimal variant.

Was this helpful?