On this page

Confidence Intervals

How Statsig calculates confidence intervals for experiment metrics, including the formulas, assumptions, and how to interpret intervals in scorecards.

Confidence intervals quantify the uncertainty in observed metric deltas. A 95% confidence interval contains the true effect 95% of the time. If an experiment ran 100 times, the true value of the metric delta would fall inside the confidence intervals 95 times.

Confidence interval visualization showing statistical significance

A 95% confidence interval that doesn't contain zero (the green bar above) represents a statistically significant result (with α = 0.05). This isn't always the case: there are cases when the p-value of the difference between test and control is statistically significant, but due to uncertainty in the control, a relative delta confidence interval may cross zero (using The Delta Method) or be represented as a point estimate (using Fieller Intervals) while the absolute difference's p-value is statistically significant.

Only 5% of the time would you expect to see the confidence interval exclude zero if the true effect was zero (a false positive). Larger confidence intervals imply less certainty in the exact size of the effect with a larger range of likely values.

Computing confidence intervals

Statsig calculates confidence intervals using a two-sample z-test. This test requires the variance of the metric delta being measured, which Statsig derives differently depending on the metric type (details at Variance).

After establishing the variance of the delta, you can compute the confidence intervals.

Two-sided tests

For the absolute metric delta, the confidence interval is given by:

CI(ΔX)=ΔX±Zα/2var(ΔX)CI(\Delta \overline{X}) = \Delta \overline{X} \pm Z_{\alpha/2} \cdot \sqrt{{var(\Delta \overline{X})}}

where:

  • $Z_{\alpha/2}$ is the z-critical value for the desired significance level (1.96 for the standard $\alpha=0.05$ and 95% confidence interval) for a two-sided test
  • $var(\Delta \overline{X})$ is the variance of the absolute delta (details here)
The confidence interval for the relative metric delta can use one of two methods: Fieller Intervals or the Delta Method. You can opt for either method. Statsig enables Fieller Intervals for all new customers by default.

When using Fieller Intervals, compute the relative metric delta CI using:

CI(%ΔX)=11g(XTXC1±Zα/2nCXC(1g)var(XT)nT(nT1)+XTvar(XC)XCnC(nC1))CI(\% \Delta \overline{X} ) = \frac{1}{1-g} ( \frac{\overline{X_T}}{\overline{X_C}} - 1 \pm \frac{Z_{\alpha/2}}{\sqrt{n_C} \cdot \overline{X_C}} \sqrt{(1-g) \cdot \frac{var(X_T)}{n_T(n_T-1)} + \frac{\overline{X_T} var(X_C)}{\overline{X_C} n_C (n_C-1)}})

When using the Delta Method, the confidence interval is:

\begin{split} CI(\Delta \overline X\%) &= \Delta \overline X\% \pm Z_{\alpha/2} \cdot\sqrt{{var(\Delta \overline X\%)}}\ &= \frac{\Delta \overline X}{\overline X_c} \pm Z_{\alpha/2} \cdot\sqrt{(\frac{\overline X_t}{\overline X_c})^{2} \cdot (\frac{var(X_c)}{n_c \cdot \overline X_c^2} + \frac{var(X_t)}{n_t \cdot \overline X_t^2})} \cdot 100\% \end{split}

If using the Delta Method and the control mean is not significantly away from zero, then it's simplified to:

\begin{split} CI(\Delta \overline X\%) &= \Delta \overline X\% \pm Z_{\alpha/2} \cdot\sqrt{{var(\Delta \overline X\%)}} \ &= \frac{\Delta \overline X}{\overline X_c} \pm Z_{\alpha/2} \cdot \frac{\sqrt{{var\left(\Delta \overline X\right)}}}{\overline X_c} \cdot 100\% \end{split}

One-sided tests

When running one-sided tests, the confidence interval calculation changes to account for a redistribution of the desired false positive rate when looking for increases or decreases in the metric:

CI(ΔX)={[ΔXZαvar(ΔX),+)if right-hand test  (,ΔX+Zαvar(ΔX):]if left-hand testCI(\Delta \overline{X}) = \begin{cases} \left[\Delta \overline{X} - Z_{\alpha} \cdot \sqrt{{var(\Delta \overline{X})}}, \quad +\infty \right) & \text{if right-hand test}\ \ \left(-\infty, \quad \Delta \overline{X} + Z_{\alpha} \cdot \sqrt{{var(\Delta \overline{X})}} : \right] & \text{if left-hand test} \end{cases}

where:

  • $Z_{\alpha}$ is the z-critical value for the desired significance level (1.645 for the standard $\alpha=0.05$ and 95% confidence interval) for a one-sided test
  • $var(\Delta \overline{X})$ is the same as for two-sided tests
  • the choice of confidence interval depends on if the one-sided test is looking for increases or decreases in the metric

Welch's t-test for small sample sizes

For small sample sizes, Statsig uses Welch's t-test instead of a standard z-test. Welch's t-test handles samples of unequal size or variance without increasing the false positive rate. The confidence interval calculation follows the same structure as the two-sample z-test (depending on whether the test is one- or two-sided), replacing the z-critical value with the t-critical value with degrees of freedom $\nu$.

For a two-sided test, the confidence interval is therefore:

CI(ΔX)=ΔX±tα/2var(ΔX)CI(\Delta \overline{X}) = \Delta \overline{X} \pm t_{\alpha/2} \cdot \sqrt{{var(\Delta \overline{X})}}
ν=(var(Xt)+var(Xc))2var(Xt)2Nt1+var(Xc)2Nc1=var(ΔX)2var(Xt)2Nt1+var(Xc)2Nc1\nu = \frac{\left(var(\overline X_t) + var(\overline X_c)\right)^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}} = \frac{var(\Delta\overline{X})^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}}

Where $N_t$ and $N_c$ are the number of users in the test and control groups, respectively. For a large number of degrees of freedom, the t-statistic converges with the z-statistic. Therefore, Statsig uses Welch's t-test only when $\nu < 100$.

Comparing experiment data to a fixed baseline: one-sample t-test

To answer questions such as "Does my test variant lead to a click-through rate higher than 0.5?", define a fixed-baseline comparison when adding metrics to the experiment.

Statsig calculates the confidence interval as:

CI(ΔX)=(Xgroupfixed value)±Zvar(Xgroup)CI(\Delta \overline X) = (\overline X_{group} - fixed \ value) \pm Z \cdot\sqrt{{var( \overline X_{group})}}

Was this helpful?