CUPED - Controlled-experiment Using Pre-Existing Data
CUPED (short for Controlled-experiment Using Pre-Existing Data) is a technique which leverages user information from before an experiment to reduce the variance, and increase confidence in experimental metrics. This can help to debias experiments which have meaningful pre-exposure bias (e.g. the groups were randomly different before any treatment was applied).
Our Cloud product uses a 7-day window for CUPED calculation. For Warehouse Native customers, a 7-day window is recommended, but you have the flexibility to customize it to any length.
See more at the Variance Reduction page.
CUPED for Simple Aggregations
The methodology for simple aggregations is described in the original Microsoft paper, as well as our in-depth article on the technique.
CUPED for Ratio Metrics
CUPED for ratios metrics, where each experiment unit is represented by a numerator and a denominator. The variance reduction process is performed by finding the variance of experiment data, pre-experiment data, and the covariance between the two.
Denote the numerator, denominator, pre-experiment numerator, and pre-experiment denominator as Y, N, X, and M, respectively. Using the CUPED-reduced variance formula,
Var(NcvYcv)=Var(NY)+θ2Var(MX)−2θCov(NY,MX)
where optimal θ is found as
Var(MX)Cov(NY,MX)
expanded to
Var(μMX−μM2μXM)Cov(μNY−μN2μYN,μMX−μM2μXM)
The CUPED-adjusted group means are inferred based on the control group.
NcvYcv=NY−θMX+θE[R]
While E[R] is hard to deduct, we recognized that
Ncv(control)Ycv(control)=N(control)Y(control)
Ncv(test)Ycv(test)=N(control)Y(control)−(N(control)Y(control)−θM(control)X(control))+(N(test)Y(test)−θM(test)X(test)