CUPED

How Statsig uses CUPED variance reduction to improve experiment sensitivity by adjusting for pre-experiment user behavior on metric values.

CUPED - Controlled-experiment Using Pre-Existing Data

CUPED (short for Controlled-experiment Using Pre-Existing Data) is a technique that uses user information from before an experiment to reduce variance and increase confidence in experimental metrics. Statsig defines this pre-experiment data as the 7 days before each user's exposure rather than a fixed window before the experiment starts for all users. This approach helps debias experiments that have meaningful pre-exposure bias (for example, groups that were randomly different before you applied any treatment).

CURE extends CUPED for Warehouse Native, so use CURE when you need covariates beyond a metric's own pre-experiment history, such as new-user experiments or metrics that aren't autocorrelated.

The Cloud product uses a 7-day window for CUPED calculation. For Warehouse Native customers, Statsig recommends a 7-day window, but you can customize it to any length.

For more details, refer to the Variance Reduction page.For an in-depth look at the methodology, refer to CURE by Statsig.

CUPED for simple aggregations

The original Microsoft paper describes the methodology for simple aggregations, as does the in-depth article on the technique.

The Cloud product uses stratification alongside CUPED to account for users who may not have pre-experiment data. Statsig groups users into strata based on available pre-experimentation information, estimates treatment and control effects within each stratum, then aggregates them to produce an overall result. Statsig then applies the standard difference-in-means and variance estimation. This approach lets Statsig retain users with missing pre-data while still benefiting from variance reduction where applicable.

CUPED for ratio metrics

The Microsoft paper also gives details on how to implement CUPED for metrics with a different analysis unit (Appendix B). On Statsig, this methodology extends to ratio metrics, where a numerator and a denominator represent each experiment unit. The variance reduction process finds the variance of experiment data, pre-experiment data, and the covariance between the two.

Denote the numerator, denominator, pre-experiment numerator, and pre-experiment denominator of a unit as $Y$ , $N$ , $X$ , and $M$ , respectively. Using the CUPED-reduced variance formula,

Var(\frac{Y_{cv}}{N_{cv}})=Var(\frac{Y}{N})+\theta^2 Var(\frac{X}{M})-2\theta Cov(\frac{Y}{N}, \frac{X}{M})

where Statsig finds optimal $\theta$ as

\frac{Cov(\frac{Y}{N}, \frac{X}{M})}{Var(\frac{X}{M})}

expanded to \

\frac{Cov(\frac{Y}{\mu_N}-\frac{\mu_Y N}{\mu^2_N}, \frac{X}{\mu_M}-\frac{\mu_X M}{\mu^2_M})}{Var(\frac{X}{\mu_M}-\frac{\mu_X M}{\mu^2_M})}

This gives:

\frac{\hat{Y_{c}}}{\hat{N_{c}}}=\frac{Y_{c}}{N_{c}}-\theta( \frac{X_{c}}{M_{c}} - \mathbb{E}[R])

\frac{\hat{Y_{t}}}{\hat{N_{t}}}=\frac{Y_{t}}{N_{t}}-\theta( \frac{X_{t}}{M_{t}} - \mathbb{E}[R])

Because $\mathbb{E}[R]$ is hard to derive, the expectation term is the same for both groups. Substituting $\mathbb{E}[R]$ with $\frac{X_{c}}{M_{c}}$ transforms the formulas above to the following two:

\frac{Y_{cv}(control)}{N_{cv}(control)}=\frac{Y(control)}{N(control)}

\frac{Y_{cv}(test)}{N_{cv}(test)} \\ :=\frac{Y(control)}{N(control)} - (\frac{Y(control)}{N(control)} - \theta \frac{X(control)}{M(control)}) + (\frac{Y(test)}{N(test)} - \theta\frac{X(test)}{M(test)}) \\ :=\frac{Y(test)}{N(test)} - \theta\frac{X(test)}{M(test)} + \theta \frac{X(control)}{M(control)}

Using the optimal $\theta$ , Statsig reduces group-level variance by applying the parameter to calculate the adjustment. Across-group $\theta$ doesn't necessarily reduce variance for one group, or the sum of variances of all groups, but in most cases it does. Simulations show that 98.3% of metrics saw a decrease through CUPED.

Statsig uses CUPED variance when all of the following are true:

Core assumptions of the CUPED model hold; rounding error or other data artifacts can violate this
- E(X_hat) = E(X)
- The pooled variance of the adjusted population across groups is < the variance of the unadjusted population
Enough units have pre-experiment values (> 100)
Enough percentage of units have pre-experiment values (> 5%)

Was this helpful?