Holdouts

Holdouts measure the aggregate impact of multiple features. It involves a “holdout” group of users that are held back from a set of features for measurement. While each A/B test or experiment you run compares control and test groups for that feature, a holdout compares the “holdout” group (Control) against users who have been exposed to multiple features and experiments.

How to use Holdouts

To create a new holdout, navigate to the Holdouts section on the Statsig console (it is a specialized kind of experiment).
Click the Create New button and enter the name, description and unit type of the holdout that you want to create.
You can choose to either create a global or a selected holdout. A global holdout is automatically added to any new feature with the same unit type, and is meant to capture the aggregate impact of all features developed after the holdout began (individual features may be opted out as needed). A selected holdout captures the aggregate impact of a specific selection of features that you want to hold off.
By default Holdouts apply to a % of all users (Population = Everyone). You can optionally target the Holdout at a subset of users by applying a Targeting Gate (Population = Targeting Gate). e.g. If you wanted an iOS users only Holdout, you could apply a Targeting Gate that only passes iOS users.
You must set the percentage of users to be held-out between 1% to 10%. Statsig recommends a small holdout percentage to limit the number of customers who don’t see new features.

How to read Holdouts

Holdouts on Statsig use the same “equal variant” methodology as Feature Gate rollouts, whereby metric lifts are computed by equal sized groups to calculate holdout lift. You can read more about the advantages of this methodology in “A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments” by Ron Kohavi, Alex Deng, & Lukas Vermeer. Accordingly, the Cumulative Exposures panel for a given Holdout shows total exposures of the Holdout, broken down into three groups:

Units that were included in the Holdout, and were used for analysis
Units that were not included in the Holdout, and were used for analysis vs. the holdout group
Units that were not included in the Holdout, and were not used for analysis

For units not included in the Holdout, we generate two groups (used for analysis and not used for analysis). This is a randomly chosen group that is meant to balance the comparison. Holdout metric lifts represent the cumulative impact of launched and active experiments on the Holdout group vs. the same % of the rest of the population who were subject to the included rollouts and experiments. In the example below, the 1% Holdout is comparing the metric values of the users in Holdout vs. 1% from users not in Holdout. The launched features are having an overall negative effect on the “Add to Cart” metric.

Holdout pulse results showing metric lift comparison between holdout and exposed users

Best Practices

Size - Statsig recommends a low single-digit holdout percentage, say 1% – 2%, to limit the number of customers who don’t see new features.
Duration - Statsig recommends operating holdouts for a period of three to six months, and then releasing the holdout. Prolonging the holdout period may increase the complexity of your software as you’d have to maintain a functioning product with no new features for a longer period.
Back testing - Occasionally you may want to turn off a set of features that you have already released to measure the effectiveness of those features. Statsig doesn’t recommend this as it turns off features that users are already using and relying on. However, when a “back measurement” is critical, you can use Holdouts to turn off a set of features and automatically compute the impact of this set of features.

Unit ID Types

By default, holdouts are based on User ID. To use a different ID type, select it from the drop down menu during the holdout creation.

Holdouts can only be applied to Experiments and Feature Gates that use the same randomization unit. If a team plans to run experiments on both User ID and Stable ID, two separate holdouts are required to evaluate the cumulative impact of each type of experiment.

Holdout effects on Gates & Experiments SDK methods

Feature Flags/Gates

For users in holdout, gates will always return False.

Experiments

For users in holdout, if the experiment is not in a Layer, calls to get experiment parameters will always return the “default value” passed in code.
For users in holdout, if the experiment is in a Layer, calls to get experiment parameters will return the values defined in the Layer defaults in the Statsig console.

When you ship an experiment in a layer - this would normally update the layer defaults, however, users in the holdout will not see those defaults, with the layer instead having a new set of default parameters just for held-out users:

Ending an Holdout

To end a holdout and allow users in the holdout group to see all held-out features, you can disable the holdout. Disabling it stops tracking the effects of those features, but the results will still be retained for future reference. Alternatively, you can delete the holdout if it was created by mistake or if you no longer need to keep the results.

Get Started

Experiments

Feature Management

Analytics

AI Evals

Other Features

Tutorials

How to use Holdouts

How to read Holdouts

Best Practices

Unit ID Types

Holdout effects on Gates & Experiments SDK methods

Feature Flags/Gates

Experiments

Ending an Holdout

Get Started

Experiments

Feature Management

Analytics

AI Evals

Other Features

Tutorials

​How to use Holdouts

​How to read Holdouts

​Best Practices

​Unit ID Types

​Holdout effects on Gates & Experiments SDK methods

​Feature Flags/Gates

​Experiments

​Ending an Holdout

How to use Holdouts

How to read Holdouts

Best Practices

Unit ID Types

Holdout effects on Gates & Experiments SDK methods

Feature Flags/Gates

Experiments

Ending an Holdout