Contextual Bandit

Use Statsig Contextual Bandits to personalize variant selection per user based on context features while maximizing a single goal metric over time.

Model

Autotune AI implements a variant of the LinUCB algorithm. The algorithm estimates a user's outcome for each variant and incorporates the model's uncertainty to determine an "upper bound" for the prediction. Autotune AI uses this upper bound to select a variant, which is the explore mechanism. In practical terms, the bandit selects for the highest predicted potential upside.

For example, if Autotune AI has two variants and for user "Bob" they have the following predictions for click-through rate:

Variant A: 6% ± 0.5%
Variant B: 4% ± 3%

Autotune AI serves the user variant B, even though A has a higher prediction, because the "upside" of B's prediction is 7% versus A's 6.5%. A variant's prediction usually has higher variance when there are few samples or when the relationship between features and outcome is uncertain. As Statsig delivers more traffic, the uncertainty shrinks. After more samples, the values might become:

Variant A: 5.9% ± 0.4%
Variant B: 4.2% ± 1.5%

In which case Autotune would serve A, since 6.3% > 5.7%.

Some helpful references:

Feature types

Autotune AI works with categorical and numerical features. Statsig converts key-value pairs attached to the custom object on the Statsig user into categorical or numerical features based on their data type and one-hot-encodes categorical features.

You don't need to build complex training pipelines. The features you use for model evaluation also train the model.

Outcome types

Autotune AI supports several model types internally, enabling use for both classification outcomes (for example, whether a user clicks a button) and continuous outcomes (for example, how much time a user spends reading articles).

These models follow the LinUCB family approach: normalizing data, creating a linear model to estimate an outcome, and applying model uncertainty to score an upper bound.

For classification outcomes, Autotune AI identifies whether any outcome occurs within its attribution window. For continuous outcomes, Autotune AI requires an event name and field name and uses the numerical value from the first observed event after exposure.

Advantages

The primary advantage of a contextual bandit is its ability to optimize based on user attributes. This capability lets you optimize product and marketing outcomes beyond selecting a single "best experiment." Contextual Bandits function with very little training data, and Statsig retrains them hourly, so you have a functional personalization tool running within hours of launching your bandit.

For example:

Increasing Outcomes: Suppose you offer discount promotions at checkout to increase completion rates. Based on the total value of a user's cart, the user's spending history, and the user's country code, users might respond differently to a "10% off" coupon versus a "Free shipping" coupon.
Avoiding Harm: Suppose you want to show a referral-code upsell to users, but not to users who won't share. Autotune AI can help you target users likely to copy the referral link and avoid showing the unit to users who dismiss it.

Disadvantages

The primary disadvantages of a Contextual Bandit compared to a Multi-Armed Bandit are the lack of convergence and the potential for overfitting to training data. Statsig uses regression formats (ridge, normalized logistic regression, and others) that deliberately omit low-signal predictors to reduce overfitting.

Contextual bandit models may also not capture complex feature interactions that a more complex model (such as a well-tuned GBDT or neural network) can exploit. Contextual bandits are a useful personalization tool, but aren't likely to outperform a dedicated ML team.

Methodology

Samples required

Contextual Bandits can start personalizing with very few samples (tens of samples). Statsig mostly uses initial traffic for exploration.

Attribution

Statsig attributes outcomes by joining the target event to downstream events within the attribution window. For metadata-based contextual bandits, Statsig uses the first observed event by logging timestamp. For binary bandits (did an event happen), the flag is 1/0 based on whether there are more than 0 events during the attribution window.

If running multiple bandits, there's no cross-bandit attribution logic. If multiple bandits share the same outcome event in their attribution window, all bandits count that event as part of their outcome space.

Exploration

During the explore period, Statsig assigns all units a random variant. After the explore period, a small portion of traffic still receives a managed "Explore" (random) variant to keep the model from becoming stale. This portion decreases to a terminal 1% based on timeline and samples observed. Statsig distributes explore traffic inversely to the current distribution, so Statsig up-samples "rare" variants. Exploration therefore over-represents underperforming variants to give them a chance to improve.

Exploration assignments appear in the log stream with :explore appended.

Was this helpful?