On this page

CUPED

How Statsig uses CUPED variance reduction to improve experiment sensitivity by adjusting for pre-experiment user behavior on metric values.

CUPED - Controlled-experiment Using Pre-Existing Data

CUPED (short for Controlled-experiment Using Pre-Existing Data) is a technique that uses user information from before an experiment to reduce variance and increase confidence in experimental metrics. At Statsig, this pre-experiment data is defined as the 7 days before each user's exposure rather than a fixed window before the experiment starts for all users. This helps debias experiments that have meaningful pre-exposure bias (for example, groups that were randomly different before any treatment was applied).

The Cloud product uses a 7-day window for CUPED calculation. For Warehouse Native customers, Statsig recommends a 7-day window, but you can customize it to any length.

For more details, refer to the Variance Reduction page.For an in-depth look at the methodology, refer to CURE by Statsig.

CUPED for simple aggregations

The methodology for simple aggregations is described in the original Microsoft paper, as well as the in-depth article on the technique.

The Cloud product uses stratification alongside CUPED to account for users who may not have pre-experiment data. Statsig groups users into strata based on available pre-experimentation information, estimates treatment and control effects within each stratum, then aggregates them to produce an overall result. Statsig then applies the standard difference-in-means and variance estimation. This approach lets Statsig retain users with missing pre-data while still benefiting from variance reduction where applicable.

CUPED for ratio metrics

The Microsoft paper also gives details on how to implement CUPED for those with a different analysis unit (Appendix B). On Statsig, this extends to ratio metrics, where each experiment unit is represented by a numerator and a denominator. The variance reduction process finds the variance of experiment data, pre-experiment data, and the covariance between the two.

Denote the numerator, denominator, pre-experiment numerator, and pre-experiment denominator of a unit as $Y$, $N$, $X$, and $M$, respectively. Using the CUPED-reduced variance formula,

$$

Var(\frac{Y_{cv}}{N_{cv}})=Var(\frac{Y}{N})+\theta^2 Var(\frac{X}{M})-2\theta Cov(\frac{Y}{N}, \frac{X}{M}) $$

where optimal $\theta$ is found as

$$

\frac{Cov(\frac{Y}{N}, \frac{X}{M})}{Var(\frac{X}{M})} $$

expanded to \ $$

\frac{Cov(\frac{Y}{\mu_N}-\frac{\mu_Y N}{\mu^2_N}, \frac{X}{\mu_M}-\frac{\mu_X M}{\mu^2_M})}{Var(\frac{X}{\mu_M}-\frac{\mu_X M}{\mu^2_M})}

$$

This gives:

Yc^Nc^=YcNcθ(XcMcE[R])\frac{\hat{Y_{c}}}{\hat{N_{c}}}=\frac{Y_{c}}{N_{c}}-\theta( \frac{X_{c}}{M_{c}} - \mathbb{E}[R])
Yt^Nt^=YtNtθ(XtMtE[R])\frac{\hat{Y_{t}}}{\hat{N_{t}}}=\frac{Y_{t}}{N_{t}}-\theta( \frac{X_{t}}{M_{t}} - \mathbb{E}[R])

Because $\mathbb{E}[R]$ is hard to derive, the expectation term is the same for both groups. Substituting $\mathbb{E}[R]$ with $\frac{X_{c}}{M_{c}}$ transforms the formulas above to the following two:

$$

\frac{Y_{cv}(control)}{N_{cv}(control)}=\frac{Y(control)}{N(control)} $$

$$

\frac{Y_{cv}(test)}{N_{cv}(test)} \ :=\frac{Y(control)}{N(control)} - (\frac{Y(control)}{N(control)} - \theta \frac{X(control)}{M(control)}) + (\frac{Y(test)}{N(test)} - \theta\frac{X(test)}{M(test)}) \ :=\frac{Y(test)}{N(test)} - \theta\frac{X(test)}{M(test)} + \theta \frac{X(control)}{M(control)} $$

Using the optimal $\theta$, Statsig reduces group-level variance by applying the parameter to calculate the adjustment. Across-group $\theta$ doesn't necessarily reduce variance for one group, or the sum of variances of all groups, but in most cases it does. Simulations show that 98.3% of metrics saw a decrease through CUPED.

Statsig uses CUPED variance when all of the following are met:

  • Core assumptions of the CUPED model are satisfied; rounding error or other data artifacts can violate this
    • E(X_hat) = E(X)
    • The pooled variance of the adjusted population across groups is < the variance of the unadjusted population
  • Enough units have pre-experiment values (> 100)
  • Enough percentage of units have pre-experiment values (> 5%)

Was this helpful?