Autotune (Bandits)

Autotune and Autotune AI automatically weigh explore versus exploit to deliver the best-performing variant for a single metric.

Autotune and Autotune AI are Statsig's Multi-Armed Bandit solutions. They automatically find the best variant among a group of candidates while dynamically allocating traffic to optimize for a single target metric. Unlike a standard A/B test, Autotune shifts traffic toward the winning variant to maximize a target metric. A standard A/B test instead holds traffic allocation fixed to measure each variant's impact.

Autotune, the Multi-Armed Bandit solution, allocates traffic toward high-performing variants and can eventually identify a winning variant.

How Autotune works

Autotune is Statsig's Bayesian Multi-Armed Bandit. It tests and measures different variations and their effect on a target outcome. The multi-armed bandit continuously adjusts traffic toward the best-performing variations until it can confidently pick the best variation, which then receives 100% of traffic.

Bandits balance the explore/exploit problem: exploiting the current best-known solution versus exploring to gather more information about other solutions.

The blog posts on Multi-Armed Bandits and Contextual Bandits go into depth on use cases and considerations.

	A/B/n Test	Multi-Armed Bandit (Autotune)	Contextual Bandit (Autotune AI)	Ranking Engine
Typical # Variants	2-3	4-8	4-8	Arbitrary #
Personalization Factor	None	None	Moderate	High
Input Data Required	None	Very Little (100+ samples)	Little - generally 1000+ samples	Tens of thousands to millions of samples
Model Efficacy	None	Basic	Moderate	High
Identifies Best Variant	Yes	Yes	No	No
Consistent User Assignment	Yes	No	No	No

Implementing Autotune

Implementing an Autotune requires checking an experiment in Statsig. After initialization, or on server SDKs, the check has sub-millisecond latency.

Autotune has a JSON config associated with each variant. The SDK returns this config. You can use the config to modify elements of your webpage, such as an image URL or button color. You can also use the config to identify which variant is active, so you know which code to run.

When to use Autotune

Autotune has two major differences from A/B testing (Statsig Experiments):

The traffic split isn't fixed over the duration of the test. This allows Autotune to divert more traffic to the winner and less to losers while making fewer mistakes. However, the user experience may not be consistent upon repeated visits.
Autotune can only optimize for a single metric. Autotune can't accurately measure a collection of metrics, and isn't a reliable way to understand secondary effects of your changes. It works best when the metric is well-understood and has a direct, immediate relationship to the change being tested.

Because of these differences, Statsig recommends Autotune in the following scenarios:

The cost of exposing users to a losing treatment is high. For example, sending new users to an inferior landing page may result in lost revenue or churn. Testing two registration flows may cause some users to never sign up. Autotune avoids permanently losing users because it adapts quickly to feedback, unlike a static A/B test.
You want the decision to be automated. Because Autotune automatically selects the winner, it requires no human decision-making. Automated selection is well-suited for launching many simultaneous tests or running a long-term unmonitored test.
It's acceptable for users to see different experiences upon return visits. For example, when changing text or recommendation algorithms.
You have one simple metric to optimize for (for example, click-through rate) with an immediate effect on the test.
You want to test multiple variations. Autotune quickly eliminates poor performers while focusing traffic on the best variants.

Avoid Autotune in the following scenarios:

When you have a complex ecosystem and want to understand secondary effects, tradeoffs between variants, and user behavior.
When you're optimizing for complex metrics or delayed effects.

For these cases, use A/B testing with Experiments. In general, it's also a best practice to run Autotune within an experiment. Use a small holdback group that doesn't receive the Autotune, so you can measure its impact.

Was this helpful?