Skip to main content

Confidence Intervals

Confidence intervals are an intuitive way to quantify the uncertainty in the observed metric deltas. A 95% confidence interval should contain the true effect 95% of the time. This means that if we ran an experiment 100 times, the true value of the metric delta should be inside the confidence intervals 95 times.

image

In practical terms, a 95% confidence interval that doesn't contain zero (the green bar above) represents a statistically significant result (with α = 0.05). Only 5% of the time would we expect to see that confidence interval if the true effect was zero (a false positive). Larger confidence intervals imply less certainty in the exact size of the effect with a larger range of likely values.

Computing Confidence Intervals

Confidence intervals are calculated using a two-sample z-test. This requires knowledge of the variance in the metric delta we're measuring, which is derived differently depending on the type of metric (details here).

For the absolute metric delta, the confidence interval is given by:

CI(ΔX)=ΔX±Zvar(ΔX)CI(\Delta \overline X) = \Delta \overline X \pm Z \cdot\sqrt{{var(\Delta \overline X)}}

where:

  • Z is the z-statistic for the desired significance level (1.96 for two-sided 95% confidence interval)
  • var(ΔX-bar) is the variance of the absolute delta

Similarly, the confidence interval for the relative metric delta is:

CI(ΔX%)=ΔX%±Zvar(ΔX%)=ΔX%±Zvar(ΔXXc)100%CI(\Delta \overline X\%) = \Delta \overline X\% \pm Z \cdot\sqrt{{var(\Delta \overline X\%)}} = \Delta \overline X\% \pm Z \cdot\sqrt{{var\left(\frac{\Delta \overline X}{\overline X_c}\right)}} \cdot100\%

Welch's T-test for Small Sample Sizes

For small sample sizes, we use Welch's t-test instead of a standard z-test. This statistical test is a better choice for handling samples of unequal size or variance without increasing the false positive rate. The structure of the confidence interval calculation remains the same as above, but instead of the z-statistic we use the t-statistic with degrees of freedom ν given by:

ν=(var(Xt)+var(Xc))2var(Xt)2Nt1+var(Xc)2Nc1\nu = \frac{\left(var(\overline X_t) + var(\overline X_c)\right)^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}}

Where Nt and Nc are the number of user in the test and control groups, respectively. Note that for large number of degrees of freedom, the t-statistic converges with the z-statistic. Therefore, Welch's t-test is used only when ν < 100.