Benjamini–Hochberg

How Statsig Warehouse Native applies the Benjamini-Hochberg procedure to control false discovery rate across many metrics in experiment scorecards.

What the Benjamini-Hochberg procedure is

The Benjamini-Hochberg Procedure ("BH" procedure) is a statistical method that reduces the probability of false positives by adjusting the significance level for multiple comparisons. The BH procedure isn't as extreme as a Bonferroni Correction. A Bonferroni Correction controls the chance of at least one false positive (Family Wise Error Rate), while the BH procedure controls the expected value of false positives after you reject the null hypothesis (False Discovery Rate).

You can enable the BH procedure for individual experiments, or configure global Experiment Settings to enable it by default.

Benjamini-Hochberg procedure configuration interface

Methodology

The Benjamini-Hochberg Procedure updates the significance level (modifying your pre-set

\alpha

). Statsig calculates the new significance level by sorting the p-values of metrics in ascending order and comparing each with a paired threshold. Each p-value’s paired threshold is the False Discovery Rate you want divided by the number of comparisons Statsig evaluates, multiplied by the rank of that p-value in the ordered list. The largest threshold value that is higher than its corresponding p-value becomes the new significance level (

\alpha

You can apply the Benjamini-Hochberg Procedure based on:

The number of test groups (multiple treatment hypotheses). For each metric aggregate the list of p-values from each variant and complete the Benjamini-Hochberg procedure.
The number of metrics in the scorecard. For each variant aggregate the list of p-values from each metric and complete the Benjamini-Hochberg procedure.
Both the number of test groups and number of metrics in the scorecard. Statsig aggregates all p-values to complete the Benjamini-Hochberg procedure.

Statsig doesn't apply the BH procedure when evaluating the p-values of any event-dimension or user-property experiment metric results. Statsig compares only the top-line metric results to the new significance level.

How experiment metrics appear after applying Benjamini-Hochberg

In the experiment scorecard section, Statsig derives confidence intervals from (1 - adjusted α) for applicable metrics. Hovering over a confidence interval displays the adjusted α alongside other relevant metric details.

In the experiment explore section, Statsig calculates a new adjusted α based on your selections, and the confidence intervals use (1 - adjusted α).

Was this helpful?