Skip to main content

p-Value Calculation

In hypothesis testing, the p-value is the probability of observing an effect larger than or equal to the measured metric delta, under the assumption that the null hypothesis is true. In practice, a p-value that's lower than your pre-defined threshold is treated as evidence for there being a true effect.

The methodology used for p-value calculation depends on the number of degrees of freedom (ν). A two-sample z-test is appropriate for most experiments. Welch's t-test is used for smaller experiments with ν < 100. In both cases, the p-value depends on the metric mean and variance computed for the test and control groups.

Two-Sample Z-Test

The z-statistic of a two-sample z-test is:

Z=XtXcvar(Xt)+var(Xc)Z = \frac{\overline X_t - \overline X_c}{\sqrt{var(\overline X_t)+ var(\overline X_c)}}

The two-sided p-value is obtained from the standard normal cumulative distribution function:

pvalue=212πZet2/2dtp-value = 2\cdot \frac{1}{\sqrt{2\pi}} \int \limits _{-\infty}^{-|Z|}{e^{-t^2/2}dt}

Welch's t-test

For smaller sample sizes, Welch's t-test is the preferred statistical test for lower false positive rates in cases of unequal sizes and variances. In Pulse, Welch's t-test is automatically applied when the degrees of freedom ν < 100.

The t-statistic is computed in the same way as the two-sample z-statistic above. Additionally, we compute the degrees of freedom ν using:

ν=(var(Xt)+var(Xc))2var(Xt)2Nt1+var(Xc)2Nc1\nu = \frac{\left(var(\overline X_t) + var(\overline X_c)\right)^2}{\frac{var(\overline X_t)^2}{N_t - 1}+\frac{var(\overline X_c)^2}{N_c - 1}}

The p-value is then obtained from the t-distribution with ν degrees of freedom.