Methodology
This page covers the high level approach that Statsig takes to running contextual bandits across cloud and warehouse native. Specifics of implementation change frequently as we experiment and optimize our approaches, so this documentation is deliberately high level.Core Approach
What is implemented in Statsig is close to the the disjoint model methodology from Li, Chu, Langford, Schapire. In short, several models are trained - one per variant - and their estimate CIs modelled. When the contextual autotune is triggered, the latest version of the model is used to estimate the user’s outcome, plus the upper end of the 95% CI for the estimate.
Training Data and Sampling
To keep data relevant, contextual autotune data is preferentially upsampled for recent dates. Specifically there are two mechanisms for sampling:- A flat number of samples, preferring most recent records, are selected
- Per day, in the last two weeks, samples are chosen to prefer more recent records
- Samples from the explore dataset are strictly preferred, but non-explore data may be chosen after to satisfy sample requirements. Records are then prioritized by a unit-ID hash to maintain stability in the training set between subsequent runs and avoid major jitter
- A sample set is chosen per-variant using this methodology to avoid bias from a model dominating the space and therefore being overrepresented in training data