On this page

Pre-Experiment Bias

How Statsig detects and corrects for pre-experiment bias caused by uneven user distributions between treatment and control groups before exposure.

In some cases, users in two experiment groups can have meaningfully different average behaviors before your experiment applies any intervention to them. If this difference persists after your experiment starts, experiment analysis may attribute that pre-existing difference to your intervention, making a result appear more or less significant than it is.

CUPED helps address this bias, but can't fully account for it. Some metrics, such as retention, aren't suitable candidates for CUPED and can't be easily adjusted.

Statsig proactively measures the pre-experiment values of all scorecard metrics for all experiment groups, and determines whether the values are significantly different and could cause misinterpretations. If Statsig detects bias, it notifies users and places a warning on relevant Pulse results.

How it works

Statsig provides a "Days Since Exposure" view to help identify novelty effects and pre-experiment effects. For example, the test group in the experiment below had a consistently higher mean than the control group in the week before exposure for this metric.

Pre-experiment bias visualization showing test group with consistently higher mean than control group

Statsig detects this bias by running the standard Pulse calculation on the pre-experiment term (looking back one week in Cloud, and your configured CUPED lookback window in Warehouse Native), and calculating the p-value for the null hypothesis that the groups are identical. Statsig flags relevant results according to logic that balances awareness and false positives stemming from high numbers of scorecard metrics or groups.

What to do

Pre-experiment bias can occur by chance and is not always a major issue.

  • If the total delta is small, it may not meaningfully influence your interpretation of results.
  • If CUPED can account for the bias, the bias shouldn't affect your results.

In many cases, the warning is informational and you can proceed while treating impacted metrics with caution. This is often appropriate if the metric isn't critical to the experiment, or if you care more about the directional movement than the exact number. Additional time may also reduce the bias when there is no systemic source, as new users dilute the initial imbalance.

If the metric is critical to your analysis and you need the exact numerical value, consider resalting and restarting the experiment.

Was this helpful?