Confidence Intervals

How Statsig Warehouse Native computes confidence intervals for experiment metrics, including formulas, assumptions, and how to interpret them.

Confidence intervals are an intuitive way to quantify the uncertainty in the observed metric deltas. A 95% confidence interval should contain the true effect 95% of the time: if you ran an experiment 100 times, the true value of the metric delta should fall inside the confidence interval 95 times.

Confidence interval visualization showing statistical significance

In practical terms, a 95% confidence interval that doesn't contain zero (the green bar above) represents a statistically significant result (with α = 0.05). Significance and confidence intervals don't always align this way. The p-value of the difference between test and control can be statistically significant even when the relative delta confidence interval crosses zero (using The Delta Method) or appears as a point estimate (using Fieller Intervals). This mismatch happens because of uncertainty in the control.

Only 5% of the time would you expect to see the confidence interval exclude zero if the true effect was zero (also called a false positive). Larger confidence intervals imply less certainty in the exact size of the effect with a larger range of likely values.

Computing confidence intervals

Statsig calculates confidence intervals using a two-sample z-test. This test requires knowledge of the variance in the measured metric delta, which Statsig derives differently depending on the type of metric (refer to how Statsig derives variance).

After establishing the variance of the delta, Statsig computes the confidence intervals.

Two-sided tests

For the absolute metric delta, the confidence interval is:

CI(\Delta \overline{X}) = \Delta \overline{X} \pm Z_{\alpha/2} \cdot \sqrt{{var(\Delta \overline{X})}}

where:

$Z_{\alpha/2}$ is the z-critical value for the significance level you want (1.96 for the standard $\alpha=0.05$ and 95% confidence interval) for a two-sided test
$var(\Delta \overline{X})$ is the variance of the absolute delta (details here)

The confidence interval for the relative metric delta can use one of two methods: Fieller Intervals or the Delta Method. Customers can choose either method in their Statsig console. Statsig recommends Fieller Intervals and enables them by default for all new customers.

When using Fieller Intervals, you can compute the relative metric delta CI using:

CI(\% \Delta \overline{X} ) = \frac{1}{1-g} ( \frac{\overline{X_T}}{\overline{X_C}} - 1 \pm \frac{Z_{\alpha/2}}{\sqrt{n_C} \cdot \overline{X_C}} \sqrt{(1-g) \cdot \frac{var(X_T)}{n_T(n_T-1)} + \frac{\overline{X_T} var(X_C)}{\overline{X_C} n_C (n_C-1)}})

When using the Delta Method, the confidence interval is:

\begin{split} CI(\Delta \overline X\%) &= \Delta \overline X\% \pm Z_{\alpha/2} \cdot\sqrt{{var(\Delta \overline X\%)}}\\ &= \frac{\Delta \overline X}{\overline X_c} \pm Z_{\alpha/2} \cdot\sqrt{(\frac{\overline X_t}{\overline X_c})^{2} \cdot (\frac{var(X_c)}{n_c \cdot \overline X_c^2} + \frac{var(X_t)}{n_t \cdot \overline X_t^2})} \cdot 100\% \end{split}

If using the Delta Method and the control mean isn't significantly away from zero, then it simplifies to:

\begin{split} CI(\Delta \overline X\%) &= \Delta \overline X\% \pm Z_{\alpha/2} \cdot\sqrt{{var(\Delta \overline X\%)}} \\ &= \frac{\Delta \overline X}{\overline X_c} \pm Z_{\alpha/2} \cdot \frac{\sqrt{{var\left(\Delta \overline X\right)}}}{\overline X_c} \cdot 100\% \end{split}

One-sided tests

When running one-sided tests, the form of the confidence interval calculation changes slightly to account for a redistribution of the false positive rate you want when looking for increases or decreases in the metric:

CI(\Delta \overline{X}) = \begin{cases} \left[\Delta \overline{X} - Z_{\alpha} \cdot \sqrt{{var(\Delta \overline{X})}}, \quad +\infty \right) & \text{if right-hand test}\\ \\ \left(-\infty, \quad \Delta \overline{X} + Z_{\alpha} \cdot \sqrt{{var(\Delta \overline{X})}} : \right] & \text{if left-hand test} \end{cases}

where:

$Z_{\alpha}$ is the z-critical value for the significance level you want (1.645 for the standard $\alpha=0.05$ and 95% confidence interval) for a one-sided test
$var(\Delta \overline{X})$ is the same as for two-sided tests
the choice of confidence interval depends on if the one-sided test is looking for increases or decreases in the metric

Welch's t-test for small sample sizes

For small sample sizes, Statsig uses Welch's t-test instead of a standard z-test. Welch's t-test handles samples of unequal size or variance without increasing the false positive rate. The structure of the confidence interval calculation remains the same (depending on whether you use a 1- or 2-sided test), replacing the z-critical value with the t-critical value with degrees of freedom $\nu$ .

For a two-sided test, the confidence interval is therefore:

CI(\Delta \overline{X}) = \Delta \overline{X} \pm t_{\alpha/2} \cdot \sqrt{{var(\Delta \overline{X})}}

\nu = \frac{\left(var(\overline X_t) + var(\overline X_c)\right)^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}} = \frac{var(\Delta\overline{X})^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}}

Where $N_t$ and $N_c$ are the number of users in the test and control groups, respectively. For a large number of degrees of freedom, the t-statistic converges with the z-statistic. Therefore, Statsig uses Welch's t-test only when $\nu < 100$ .

Comparing experiment data to a fixed baseline: one-sample t-test

Sometimes you want to answer questions like "Does my test variant lead to a click-through rate higher than 0.5?" You can define a fixed-baseline comparison when adding metrics to the experiment.

Statsig calculates the confidence interval as

CI(\Delta \overline X) = (\overline X_{group} - fixed \ value) \pm Z \cdot\sqrt{{var( \overline X_{group})}}

Was this helpful?