Benjamini–Hochberg Procedure

How Statsig applies the Benjamini-Hochberg procedure to control the false discovery rate when analyzing many metrics in an experiment scorecard.

What is the Benjamini-Hochberg procedure

The Benjamini-Hochberg Procedure ("BH" procedure) is a statistical method that reduces the probability of false positives by adjusting the significance level for multiple comparisons. BH is less strict than a Bonferroni Correction. A Bonferroni Correction controls the chance of at least one false positive (Family Wise Error Rate). BH instead controls the expected proportion of false positives when you reject the null hypothesis (False Discovery Rate).

You can enable the BH procedure for individual experiments, or configure global Experiment Settings to use it by default.

Benjamini-Hochberg procedure configuration interface

How BH adjusts the significance level

The Benjamini-Hochberg Procedure updates the significance level (modifying your pre-set

\alpha

). Statsig calculates the new significance level by sorting metric p-values in ascending order and comparing each with a paired threshold. Each p-value’s paired threshold is the False Discovery Rate you want divided by the number of comparisons, multiplied by the rank of that p-value in the ordered list. The largest threshold value higher than its corresponding p-value becomes the new significance level (

\alpha

You can apply the Benjamini-Hochberg Procedure based on:

The number of test groups (multiple treatment hypotheses). For each metric, aggregate the list of p-values from each variant and complete the Benjamini-Hochberg procedure.
The number of metrics in the scorecard. For each variant, aggregate the list of p-values from each metric and complete the Benjamini-Hochberg procedure.
Both the number of test groups and number of metrics in the scorecard. Statsig aggregates all p-values to complete the Benjamini-Hochberg procedure.

Statsig doesn't apply BH procedure when evaluating the p-values of any event-dimension or user-property experiment metric results. Statsig compares only the top-line metric results to the new significance level.

How experiment metrics appear after applying BH

In the experiment scorecard section, Statsig derives confidence intervals for applicable metrics from (1 - adjusted α). Hovering over a confidence interval displays the adjusted α alongside other relevant metric details.

In the experiment explore section, Statsig calculates a new adjusted α based on your selections, and the confidence intervals use (1 - adjusted α).

Was this helpful?