On this page

Variance Reduction

Overview of variance reduction techniques in Statsig experiments, including CUPED, stratified sampling, and regression adjustment for higher sensitivity.

Variance reduction

Variance measures the amount of noise in a metric or experiment results. Higher variance produces larger confidence intervals and requires more sample size to consistently observe a statistically significant result for the same effect size.Reducing variance shortens experiment run times by requiring less sample. Statsig uses a form of CUPED based on a 2013 Microsoft paper (Deng, Xu, Kohavi, & Walker). Statsig automatically applies CUPED to experiments and runs it for the topline results on key metrics in Pulse. This produces significant variance reduction for the large majority of metrics where CUPED can be applied.For more details, refer to the launch post for CUPED.

CUPED - Controlled-experiment Using Pre-Existing Data

CUPED (short for Controlled-experiment Using Pre-Existing Data) is a technique that uses user information from before an experiment to reduce variance and increase confidence in experimental metrics. At Statsig, this pre-experiment data covers the 7 days before each user's exposure, rather than a fixed window before the experiment starts for all users. This helps reduce bias in experiments where groups were randomly different before any treatment applied.

The Cloud product uses stratification alongside CUPED to account for users who may not have pre-experiment data. Statsig groups users into strata based on available pre-experimentation information. Statsig first estimates treatment and control effects within each stratum, then aggregates them to produce an overall result. Statsig then applies the standard difference-in-means and variance estimation. This approach retains users with missing pre-experiment data while still providing variance reduction where applicable.

Winsorization

Winsorization is another technique for reducing noise by managing the influence of outliers.

Winsorization measures the percentile P<sub>x</sub> of a metric and sets all values above P<sub>x</sub> to P<sub>x</sub>. This reduces the influence of extreme outliers caused by factors such as logging errors or bad actors.

Metric selection

The metrics you use can significantly influence the sensitivity of your analysis. The transformations described above, combined with techniques such as creating threshold-based flags, allow you to trade exact numbers for more statistical power. For more information, refer to the blog post on understanding and reducing variance.

Was this helpful?