Contextual Bandit (Autotune AI)
Introduction to Statsig Contextual Bandits, which choose the best variant per user based on context features and continuous learning from outcomes.
Contextual Multi-Armed Bandits are a subset of Multi-Armed Bandits that use context about a user to personalize their experience. Autotune AI predicts outcomes for each variant and selects the best outcome while accounting for uncertainty. Autotune AI prefers variants with high uncertainty over variants with slightly better predictions but low uncertainty, which drives exploration.
Use cases
Contextual bandits bridge the gap between un-personalized solutions and fully fledged ranking solutions. The main limitation is that contextual bandits:
- Have a fixed output set of variants they can show
- Have limited ability to account for complex context on the "object" being seen/predict for novel content (e.g. video ranking)
Their simplicity also provides advantages. Statsig's Autotune AI evaluates in near-real-time on both the server and client, taking a few milliseconds or less to return the ideal experience for a given user context. Contextual bandits are also simple to set up and test: you can set up a test in less than an hour, get model results the next hour, and start seeing experiment results the hour after that in the Statsig console.
For more discussion on use cases and motivations, refer to the Statsig blog.Methodology
Statsig's autotune AI uses a LinUCB based approach. This paper is a good introduction to the topic: Li, Chu, Langford, Schapire. For coverage of regret analysis, these lecture notes from Jain from the University of Washington are a useful resource.Autotune AI works with categorical and numerical features. Key-value pairs attached to the custom object on the Statsig user are converted into categorical or numerical features based on their data type. Categorical features are one-hot-encoded. You don't need to build complex training pipelines, though many customers pass pre-evaluated user attributes or predictions as context objects.
Statsig selects the best model (for example, Ridge or Logistic regression) based on your data types and performance, then generates a model from your data. The estimated standard error of the model produces a prediction confidence interval. During evaluation, Statsig uses user context to predict an outcome for each variant and applies the corresponding confidence interval to that prediction. The best variant is the one with the highest upper end of a 95% confidence interval. To adjust the interval size, modify the exploration parameter on the Autotune setup page.
For a detailed discussion, refer to the Methodology page.Drawbacks
Because Statsig manages the models, it can't guarantee perfect model tuning, and more advanced models such as neural networks aren't available. If recommendations are a critical business problem, this feature can serve as a starting point but isn't an appropriate long-term solution.
The current approach balances simplicity, speed, and regret minimization. Specific use cases such as real-time updates may not be fully supported.
Because the models generally assume linearity, they may not capture complex user interactions. This approach works best for broad-level effects, though feature interaction terms can provide reasonable predictive power for conditional relationships between predictors and outcomes.
Outcome types
Autotune AI supports multiple model types internally, covering both classification use cases (for example, whether a user clicks a button) and continuous outcomes (for example, how much time a user spends reading articles). You can optimize for both "outcomes" and "metrics." To minimize a metric such as latency, disable the "higher is better" setting for that metric.
For classification cases, Autotune AI identifies whether any outcome occurs within its attribution window. For continuous cases, Autotune AI requires an event name and field name, and uses the numerical value associated with that field.
Training
Training pipelines are run hourly.
For Warehouse Native customers, Statsig processes data in your warehouse and uses an anonymized feature set to train the models. Statsig exports exposures on-demand for each load up to the first million, and in daily batches after that. Log events sent to Statsig are exported hourly if you use Statsig to log outcomes, or you can connect metric sources from your warehouse for outcome tracking.
For cloud customers, Statsig processes and trains the data entirely on its servers.
SDK support
Statsig supports contextual autotune in all Client SDKs, but only in the following server SDKs:
Was this helpful?