Configuring Experiments

Reference for advanced experiment options in Statsig Warehouse Native, including variance reduction, CUPED, sequential testing, and stratified sampling.

Configuration

Advanced experiment settings in Statsig Warehouse Native tune how an experiment measures impact, so you can maximize statistical power and measure exactly what you intend. Default options still produce trustworthy statistical analysis, and each setting has tradeoffs to weigh before you change it.

Experiment settings

Basic settings

Hypothesis

Statsig requires a hypothesis to run experiments. Hypotheses should specify what an experiment aims to accomplish and how you measure that.

Primary and secondary metrics

These are the metrics used as the evaluation criteria for your experiment. Generally, Statsig recommends a small number of primary metrics as your overall evaluation criteria, and putting guardrails and exploratory metrics into Secondary.

Statsig provides project-level configuration of the maximum number of primary/secondary metrics. The appropriate number varies by company and industry, as well as the complexity of the space you're measuring.

Experiment duration

In experiment setup there are fields for Experiment Measured In and Target. These configure how long your experiment runs, influencing sequential testing as well as notifications/timeline alerts.

Statsig's power analysis tools are the best way to determine the target duration. You can attach a power analysis to an experiment to add context on the duration.

Experiment configuration

Assignment source and groups

For analysis-only experiments, Statsig pre-fills this section from the observed data in the data warehouse. You can configure images and descriptions. Statsig infers group sizes, but you should review and correct them if they don't match the intended traffic split.

To update these settings, reset the experiment from the decision menu after it has started.

Groups and parameters

For end-to-end experiments (experiments using Statsig for both assignment and analysis), this section is where you configure targeting, layers, and the groups and associated parameters. Statsig automatically associates exposures generated from the setup with this experiment for analysis.

Advanced settings

Statsig provides many advanced settings for customizing analysis. These can have complex interactions with data.

Pre-computed user dimensions

Configure dimensions as default breakdowns in pulse. For example, specify a user dimension such as country here to make it available in the scorecard results with daily loads.

This lets you skip scheduled explore queries for this dimension, and the results appear inline in the scorecard.

Stratified sampling

Stratified Sampling lets you balance experiments across behavior or segments. Statsig tests random salts and picks the one that best balances user attributions. You can partially achieve this during analysis using CURE. Refer to Stratified Sampling for more details.

ID type and secondary ID type

You configure the ID type, which is the unit of randomization for an experiment, when setting up the experiment. This is a critical field that represents the kind of entity you're experimenting on. For example, if splitting traffic randomly per user, this should be User ID. If splitting traffic by company, this should be Company ID.

Secondary ID types associate metrics from a different ID with the unit of randomization. For example, consider an experiment on a logged-out cookie ID. Specifying UserID as a secondary ID allows analysis of UserID metrics like revenue while keeping Cookie IDs as the unit of analysis (for example, the denominator in means). This lets you understand the downstream impact of experiments without introducing survivorship bias or other bias to the analysis.

This mapping can come directly from the assignment source. If multiple exposures exist for a given unit, and at least one has both ID types, Statsig identifies that these two IDs map to each other. Alternatively, you can specify mappings in an Entity Property Source: provide a mapping table of ID1 to ID2 and Statsig connects the data during analysis.

You can configure secondary ID mapping as an enforced 1:1 mapping or as first-touch attribution. Refer to the ID Resolution documentation for more details.

Allocation and cohorting

Configure Allocation Duration

If using a persistent assignment SDK (docs), this setting controls when to stop enrolling users into the experiment.

You can also use this setting without a persistent assignment SDK to filter out users exposed after the duration period. This is useful for enforcing even cohorts across a user base. For example, suppose a metric takes 7 days to mature and stopping the experiment halts its effect. Set the allocation duration to 14 days and run the experiment for 21 days. This analyzes the first 14 days of exposed units while capturing all 7 days of the last cohort's metric behavior.

Configure Analysis Period

This setting controls the dates from which Statsig collects metric data. It's useful for truncating the analysis window while continuing to run an experiment.

Allow cohort metrics to mature after experiment end

This setting allows cohort or baked metrics that take time to mature to continue collecting data after the end of an experiment. Statsig recommends this only if an experiment is a one-time intervention.

For example, consider a 14-day experiment on new users that modifies a signup page. Removing the changes to the signup page doesn't impact users who already saw it. To maximize data for the "first-week revenue" metric, enable this setting. Data then continues to collect after the experiment ends on day 14: for users enrolled on day 14, until day 21; for users enrolled on day 10, until day 17.

CURE covariates

This section lets you configure covariates for CURE. Configure strong defaults in the project settings, and use this section to add relevant/domain-specific covariates. Refer to the CURE documentation for more details.

Analysis settings

Analytics type

Whether to use Frequentist or Bayesian analysis. You can't change this once an experiment starts, to avoid cherry-picking methodology.

Apply Sequential Testing [Frequentist Only]

Controls whether Statsig applies sequential testing. Statsig recommends this setting to avoid false positives from peeking.

Bonferroni/Benjamini-Hochberg [Frequentist Only]

Configures multiple-comparisons corrections, either controlling the false positive rate or false discovery rate. Refer to the more detailed documentation for Bonferroni and Benjamini-Hochberg.

Default Confidence Interval/Chance to Beat Threshold

The confidence level used for this experiment. The default is 95%, which is the recommended industry standard. Depending on the risk profile of the experiment, a stricter or less strict setting may be appropriate.

Use Informative Priors [Bayesian Only]

For bayesian experiments, whether to use informed priors in analysis, and the configuration for them.

Turbo Mode

Whether to use Turbo Mode to run experiment reloads more quickly.

Filter Exposures by Qualifying Event

This setting allows filtering exposures to experimental units that did (or did not) trigger a secondary event beyond exposure. This is useful for analysis-only experiments on web experimentation platforms that over-expose heavily. For end-to-end experiments, use Statsig to expose only at the point of intervention.

This setting has a few inputs:

A qualifying event, which Statsig joins to exposures to determine if users triggered an event
Exclude Matching Units: if you toggle it, Statsig drops units that triggered this event from the analysis. If you leave it unchecked, Statsig filters the analysis to units that triggered the event.
Use qualifying event timestamp for first exposures: if you check it, Statsig replaces units' exposure timestamp with the qualifying event's timestamp. This is useful for small cohort windows, for example, measuring whether something happened within 10 minutes of the intervention. If the intervention occurred several minutes after the original exposure event, using the actual time the user saw the intervention can be helpful.
Filter events by time window: only consider qualifying events for the inclusion/exclusion filters that occurred within a certain time from the exposure. This is useful if a user may return and re-trigger the qualifying event when it's no longer relevant to the exposure.

Filter Assignment Source

This setting controls additional filters applied to the exposure data for this experiment. This can be useful to filter out bad dates with data known to be biased or non-representative, or to filter to a specific subset of interest for the scorecard results.

Default Date Filter

Statsig calculates results across multiple rollups, including Cumulative, 7 days, and 1 day. Cumulative is almost always the correct choice for this setting because it maximizes statistical power. You can select other views in the scorecard.

Explore (custom) query settings

Explore queries enable drilldown, filtering, grouping, and more advanced ways to cut metric data by user and metric properties. They provide a way to analyze experiment results in depth. Explore queries run with the same statistics as the scorecard; with no filters or other settings applied, results match the scorecard results.

Metrics

Pick the metrics to run the explore or drilldown analysis on. This can include tags and local metrics.

Group By

This setting is the primary tool for explore and custom queries. It allows drilldowns into user segments to understand differentiated performance. You can combine it with differential impact detection to analyze experiment results beyond topline averages.

By default, Statsig shows only the top 10 dimension levels with over 100 units per segment in results. Statsig groups the rest into an OTHER category. This prevents inflated false positive rates and keeps statistical analysis rigorous. Contact the Statsig team in Slack if you need this limit adjusted.

Filter

Use filters to include or exclude certain cohorts of units in an explore analysis by specifying the property and a filter set.

Time range for metric data

This setting filters metric data to a specific date period, for example the three days from 2024-04-10 to 2024-04-12. This is useful for investigating data anomalies, analyzing a recent cohort when results have changed significantly, or running drill-down analyses.

Filter by exposure date

In some cases it's useful to filter to units that entered the experiment on a specific date, or to exclude them (for example, units exposed on a holiday may behave atypically). You can specify filters based on user exposure date.

By switching the mode to Days Since First Exposure, you can also drill down into each cohort's behavior during a specific window after exposure. This is a useful way to understand behaviors observed in the days-since-first-exposure timeline view.

Scheduling

After running a query, you can schedule it. Scheduled queries run daily after any scheduled pulse load to update results. You can also add scheduled queries to the experiment Summary page as part of the experiment writeup and downloadable summary.

Interaction effect detection

Explore queries can trigger interaction effect detection analyses to determine whether two experiments are interacting in a synergistic or harmful way.

Was this helpful?