On this page

Contextual Bandit Methodology

Methodology behind Statsig Contextual Bandits, including the contextual algorithm, exploration strategy, model retraining, and reward attribution.

How contextual bandits work

This page covers the high-level approach that Statsig takes to running contextual bandits across cloud and Warehouse Native deployments. Implementation details change frequently as Statsig experiments and optimizes its approaches, so this documentation is intentionally high-level.

Core approach

The implementation follows the disjoint model methodology from Li, Chu, Langford, Schapire. One model is trained per variant, and estimated confidence intervals (CIs) are computed for each. When contextual autotune is triggered, the latest model version estimates the user's outcome and adds the upper end of the 95% CI to that estimate.

methodology workflow

Statsig models categorical outcomes as logistic regression with L2 penalty. Statsig models continuous outcomes as multivariate ridge regression.

Training data and sampling

To keep data relevant, contextual autotune data is upsampled to prefer recent dates. Sampling uses these mechanisms:

  • A flat number of samples is selected, preferring the most recent records.
  • Per day, over the last two weeks, samples are chosen to prefer more recent records.
  • Samples from the explore dataset are strictly preferred, but non-explore data may be used to satisfy sample requirements. Records are then prioritized by a unit-ID hash to maintain stability in the training set between runs and avoid major jitter.
  • A sample set is chosen per variant to avoid bias from a dominant model being overrepresented in the training data.

If a model has very low volume, it has low representation in the training data. Lower representation causes higher CIs, which increases the upper bound and makes that variant more likely to be selected. This acts as a bounce-back mechanism for low-traffic variants.

Model and feature updates

Models are updated hourly. If a model definition changes (features or target outcome), all data is reset, and the model retrains to match the new definition on the next hourly update.

The pre-training data pipeline is available in the history view on the results page, showing the SQL used and the caching tables where data is stored. You can use this data to validate or explore modeling approaches.

Feature encoding

Features with numerical-only values are treated as continuous random variables. All others are string-encoded and one-hot-encoded into binary variables for regression, using the top 25 levels available in the data with more than 1% coverage.

Arrays of categories or tags aren't supported, or encode only the most common tag sets. Provide tags as individual key-value pairs in the user custom object instead.

Model monitoring

Diagnostics for model characteristics over time aren't currently available. Model coefficients are visible in the results tab. You can view a comparison of performance between naive random traffic and targeted traffic to determine whether model performance relative to blind allocation improves or degrades over time.

Was this helpful?