Contextual Bandit (Autotune AI)

Introduction to Statsig Contextual Bandits, which choose the best variant per user based on context features and continuous learning from outcomes.

Contextual Multi-Armed Bandits are a subset of Multi-Armed Bandits that use context about a user to personalize their experience. Autotune AI predicts outcomes for each variant and selects the best outcome while accounting for uncertainty. Autotune AI prefers variants with high uncertainty over variants with slightly better predictions but low uncertainty, which drives exploration. Unlike a non-contextual multi-armed bandit, which converges on one global winner for all users, a contextual bandit personalizes the variant Statsig serves each user based on their context features.

Use cases

Contextual bandits bridge the gap between un-personalized solutions and fully fledged ranking solutions. The main limitation is that contextual bandits:

Have a fixed output set of variants they can show
Have limited ability to account for complex context on the "object" being seen/predict for novel content (e.g. video ranking)

Their simplicity also provides advantages. Statsig's Autotune AI evaluates in near-real-time on both the server and client, taking a few milliseconds or less to return the ideal experience for a given user context. Contextual bandits are also simple to set up and test. You can set up a test in less than an hour, get model results the next hour, and start seeing experiment results in the Statsig console the hour after that.

For more discussion on use cases and motivations, refer to the Statsig blog.

Methodology

Statsig's autotune AI uses a LinUCB based approach. This paper is a good introduction to the topic: Li, Chu, Langford, Schapire. For coverage of regret analysis, these lecture notes from Jain from the University of Washington are a useful resource.

Autotune AI works with categorical and numerical features. Statsig converts key-value pairs attached to the custom object on the Statsig user into categorical or numerical features based on their data type. Statsig one-hot-encodes categorical features. You don't need to build complex training pipelines, though many customers pass pre-evaluated user attributes or predictions as context objects.

Statsig's console supports specifying features in advance. This option can help you identify which features to fetch for the bandit when lookups are expensive or live. For Warehouse Native customers, planned work allows joining entity properties during analysis, enabling you to plug in your own feature store for Autotune AI analysis, similar to the approach used with CURE.

Statsig selects the best model (for example, Ridge or Logistic regression) based on your data types and performance, then generates a model from your data. The estimated standard error of the model produces a prediction confidence interval. During evaluation, Statsig uses user context to predict an outcome for each variant and applies the corresponding confidence interval to that prediction. The best variant is the one with the highest upper end of a 95% confidence interval. To adjust the interval size, modify the exploration parameter on the Autotune setup page.

For a detailed discussion, refer to the Methodology page.

You can also fetch a ranked list from Statsig and manually expose the variants you show to the user. This approach is useful when you have client-side filtering or want to show multiple options. Refer to Advanced Usage.

Drawbacks

Because Statsig manages the models, it can't guarantee perfect model tuning, and Statsig doesn't offer more advanced models such as neural networks. If recommendations are a critical business problem, this feature can serve as a starting point but isn't an appropriate long-term solution.

The current approach balances simplicity, speed, and regret minimization. Statsig may not fully support specific use cases such as real-time updates.

Because the models generally assume linearity, they may not capture complex user interactions. This approach works best for broad-level effects, though feature interaction terms can provide reasonable predictive power for conditional relationships between predictors and outcomes.

Outcome types

Autotune AI supports multiple model types internally, covering both classification use cases (for example, whether a user clicks a button) and continuous outcomes (for example, how much time a user spends reading articles). You can optimize for both "outcomes" and "metrics." To minimize a metric such as latency, disable the "higher is better" setting for that metric.

For classification cases, Autotune AI identifies whether any outcome occurs within its attribution window. For continuous cases, Autotune AI requires an event name and field name, and uses the numerical value associated with that field.

Training

Statsig runs training pipelines hourly.

For Warehouse Native customers, Statsig processes data in your warehouse and uses an anonymized feature set to train the models. Statsig exports exposures on-demand for each load up to the first million, and in daily batches after that. If you use Statsig to log outcomes, Statsig exports those log events hourly. Otherwise, you can connect metric sources from your warehouse for outcome tracking.

For cloud customers, Statsig processes and trains the data entirely on its servers.

SDK support

Statsig supports contextual autotune in all Client SDKs, but only in the following server SDKs:

Node
Python
Java
Elixir
Rust
Go v1.39.0+
Ruby v2.4.0+

Was this helpful?