Methodology

Model

The base Autotune implementation uses a Thompson Sampling (Bayesian) algorithm to estimate each variant's probability of being the best variant and allocate a proportional amount of traffic.

For example, if a given variant has a 60% probability of being the best, Autotune will provide it 60% of the traffic. At a high level, the multi-armed bandit algorithm works by adding more users to a treatment as soon as it recognizes that it is clearly better in maximizing the reward (the target metric).

Throughout the process, higher performing treatments are allocated more traffic whereas underperforming treatments are allocated less traffic. When the winning treatment beats the second best treatment by a specified margin, the process terminates.

Some helpful references:

Statsig Blog
Goyal and Agrawal (Microsoft Research) Regret Analysis
Doordash Engineering Summary Blog

Advantages

The major advantage of the base Multi-Armed Bandit over a contextual bandit is its ability to converge and identify a best variant. In cases where there is a "one-size-fits-all" solution, this is an elegant way to efficiently allocate traffic and determine a correct long-term solution while minimizing "regret" - the outcome of a high number of users being exposed to the worse variant like in an a/b test.

Disadvantages

The major disadvantages of a Multi-Armed Bandit as compared to a contextual bandit is its inability to personalize; in cases where there's interactions between a user's attributes and variants, vanilla Autotune can identify a global maxima that is worse than serving users their individual best variant.

You can see this in a toy example; even if the "US Flag" variant had the highest overall CTR, it would be a bad choice for CA users. In this example, both groups will converge to a sub-optimal variant.

	A/B/n Test	Multi-Armed Bandit (Autotune)	Contextual Bandit (Autotune AI)	Ranking Engine
Typical # Variants	2-3	4-8	4-8	Arbitrary #
Personalization Factor	None	None	Moderate	High
Input Data Required	None	Very Little (100+ samples)	Little - generally 1000+ samples	Tens of thousands to millions of samples
Model Efficacy	None	Basic	Moderate	High
Identifies Best Variant	Yes	Yes	No	No
Consistent User Assignment	Yes	No	No	No

Model​

Advantages​

Disadvantages​

Model

Advantages

Disadvantages