On this page

Bonferroni Correction

How Statsig applies the Bonferroni correction to adjust p-values when testing multiple metrics or comparisons in an experiment to control false positives.

What is Bonferroni correction?

A Bonferroni Correction is a statistical method that reduces the probability of false positives by adjusting the significance level for multiple comparisons.

If you run a test with α = 0.05, the probability of a false positive is 5%. Running more comparisons at the same significance level increases the chance of at least one false positive, because each comparison is an additional opportunity for a false positive.

Bonferroni correction is an optional feature on Statsig experiments. Statsig divides the significance level (α) by the number of comparisons being evaluated.

You can choose to apply these based on one or both of the following:

  • The number of test groups (multiple treatment hypotheses). Statsig divides the significance level by the number of variants being compared against control.
  • The number of metrics in the scorecard. Here you may select what percentage of your total α is divided evenly among the Primary Metrics, and the remaining α is split equally among Secondary Metrics. For example:
    • Significance level of 0.05
    • 2 Primary Metrics and 4 Secondary Metrics
    • 60% of α applied to Primary Metrics
    • Each Primary Metric is calculated with α = 0.6 * 0.05 / 2 = 0.015
    • Each Secondary Metric is calculated with α = 0.4 * 0.05 / 4 = 0.005
  • If both corrections are selected, Statsig applies them together. In the example above, to also correct for having 2 test groups, divide each α by 2.

When analyzing dimensions, if correction for metrics is enabled, Statsig applies the correction separately for the dimensional breakdown. Statsig uses the number of dimensions as the total metric count for the dimensional analysis, but this doesn't impact topline metrics.

Bonferroni correction configuration interface

How Bonferroni correction affects experiment metrics

In the experiment scorecard section, Statsig derives confidence intervals for applicable metrics from (1 - adjusted α). Hovering over a confidence interval displays the adjusted α alongside other relevant metric details.

In the experiment explore section, Statsig calculates a new adjusted α based on your selections, and the confidence intervals use (1 - adjusted α).

Was this helpful?