Monitoring your Contextual MAB

Monitor the health and performance of Statsig Contextual Bandits, including traffic allocation, reward signals, model drift, and exploration coverage.

There are three primary ways to monitor autotune performance.

Linked experiments

The most reliable way to evaluate whether a bandit is working is to measure whether it drives more of the targeted behavior compared to a baseline experience. You can set up and link an A/B test in Statsig to evaluate this, which also lets you monitor other user behaviors and guardrail metrics.

Linking an A/B test is the most rigorous form of measurement, and Statsig highly encourages it.

Standard practice is to wrap the autotune in an experiment with a binary parameter, either as 50/50 or a 90/10 holdback. You can link the experiment to the autotune to get the results on the autotune page. In code, this might look like:

plaintext

experiment_value = statsig.get_experiment('wrapping_experiment').get('flag')
default_param = '..."
if(experiment_value):
  param = statsig.get_experiment('autotune').get('param_name')
else:
  param = default_param

# use param in code

Start this experiment at the same time you launch your autotune.

Success rate

Statsig tracks the cumulative and daily success rate of your variants over time. The success rate can be difficult to interpret. For example, variant A may have lower CTR, but the users being served variant A might have had even lower CTR on other variants. Use this view for tracking and understanding, and to identify outlier variants with notably high or low performance.

Traffic allocation

Traffic allocation shows where Statsig is sending users who see your Autotune. Use this view to identify whether a variant is dominating traffic or receiving no traffic.

Model features

Statsig tracks and surfaces coefficients and feature importance. This information helps you identify which features are worth further study, or which populations may have unmet needs in your product.

Importance is an estimate of the influence of a feature on the outcome: how much the feature contributes to the prediction.
A positive coefficient means that feature leads to an outcome being more likely (or for continuous outcome spaces is associated with a higher outcome). A negative coefficient means the outcome is less likely, or is associated with a lower continuous outcome.

Was this helpful?