Configuring Experiments
Reference for advanced experiment options in Statsig Warehouse Native, including variance reduction, CUPED, sequential testing, and stratified sampling.
Configuration
Statsig Warehouse Native offers extensive configurability in experiment setup and analysis. You can run an experiment with default options and get powerful, trustworthy statistical analysis of your results, but in many cases an advanced configuration helps maximize statistical power and measure exactly what you intend to measure.
This page is a glossary for advanced experimentation settings: what they do, how to use them, and tradeoffs to consider.
Experiment settings
Basic settings
Hypothesis
Hypotheses are required to run experiments in Statsig. Hypotheses should specify what an experiment aims to accomplish and how that is measured.
Primary and secondary metrics
These are the metrics used as the evaluation criteria for your experiment. Generally, Statsig recommends a small number of primary metrics as your overall evaluation criteria, and putting guardrails and exploratory metrics into Secondary.
Statsig provides project-level configuration of the maximum number of primary/secondary metrics; this behavior varies by company and industry, as well as the complexity of the space being measured.
Experiment duration
In experiment setup there are fields for Experiment Measured In and Target. These configure how long your experiment runs, influencing sequential testing as well as notifications/timeline alerts.
Experiment configuration
Assignment source and groups
For analysis-only experiments, Statsig pre-fills this section from the observed data in the data warehouse. You can configure images and descriptions. Statsig infers group sizes, but you should review and correct them if they don't match the intended traffic split.
To update this, reset the experiment from the decision menu after it has started.
Groups and parameters
For end-to-end experiments (experiments using Statsig for both assignment and analysis), this section is where you configure targeting, layers, and the groups and associated parameters. Statsig automatically associates exposures generated from the setup with this experiment for analysis.
Advanced settings
A large number of advanced settings are available for customizing analysis. These can have complex interactions with data.
Pre-computed user dimensions
Configure dimensions as default breakdowns in pulse. For example, specify a user dimension such as country here to make it available in the scorecard results with daily loads.
This lets you skip scheduled explore queries for this dimension, and the results appear inline in the scorecard.
Stratified sampling
Stratified Sampling lets you balance experiments across behavior or segments. Statsig tests random salts and picks the one that best balances user attributions. You can partially achieve this during analysis using CURE. Refer to Stratified Sampling for more details.ID type and secondary ID type
You configure the ID type, which is the unit of randomization for an experiment, when setting up the experiment. This is a critical field that represents the kind of entity being experimented on. For example, if splitting traffic randomly per user, this should be User ID. If splitting traffic by company, this should be Company ID.
Secondary ID types associate metrics from a different ID with the unit of randomization. For example, when running an experiment on a logged-out cookie ID, specifying UserID as a secondary ID allows analysis of UserID metrics like revenue while keeping Cookie IDs as the unit of analysis (for example, the denominator in means). This lets you understand the downstream impact of experiments without introducing survivorship bias or other bias to the analysis.
This mapping can come directly from the assignment source. If multiple exposures are logged for a given unit, and at least one has both ID types, Statsig identifies that these two IDs are mapped to each other. Alternatively, you can specify mappings in an Entity Property Source: provide a mapping table of ID1 to ID2 and Statsig connects the data during analysis.
Allocation and cohorting
Configure Allocation Duration
If using a persistent assignment SDK (docs), this setting controls when to stop enrolling users into the experiment.This setting can also be used without a persistent assignment SDK to filter out users exposed after the duration period. This is useful for enforcing even cohorts across a user base. For example, if a metric takes 7 days to mature and stopping the experiment halts its effect, set the allocation duration to 14 days and run the experiment for 21 days. This analyzes the first 14 days of exposed units while capturing all 7 days of the last cohort's metric behavior.
Configure Analysis Period
This setting controls the dates from which Statsig collects metric data. It is useful for truncating the analysis window while continuing to run an experiment.
Allow cohort metrics to mature after experiment end
This setting allows cohort or baked metrics that take time to mature to continue collecting data after the end of an experiment. This is only recommended if an experiment is a one-time intervention.
For example, consider a 14-day experiment on new users that modifies a signup page. Removing the changes to the signup page doesn't impact users who already saw it. To maximize data for the "first-week revenue" metric, enable this setting so data continues to collect for users enrolled on day 14 until day 21, and for users enrolled on day 10 until day 17, even after the experiment ends on day 14.
CURE covariates
This section lets you configure covariates for CURE. Configure strong defaults in the project settings, and use this section to add relevant/domain-specific covariates. Refer to the CURE documentation for more details.Analysis settings
Analytics type
Whether to use Frequentist or Bayesian analysis. You can't change this once an experiment starts, to avoid cherry-picking methodology.
Apply Sequential Testing [Frequentist Only]
Controls whether sequential testing is applied. Statsig recommends this setting to avoid false positives from peeking.Bonferroni/Benjamini-Hochberg [Frequentist Only]
Configures multiple-comparisons corrections, either controlling the false positive rate or false discovery rate. Refer to the more detailed documentation for Bonferroni and Benjamini-Hochberg.Default Confidence Interval/Chance to Beat Threshold
The confidence level used for this experiment. The default is 95%, which is the recommended industry standard. Depending on the risk profile of the experiment, a stricter or less strict setting may be appropriate.
Use Informative Priors [Bayesian Only]
For bayesian experiments, whether to use informed priors in analysis, and the configuration for them.
Turbo Mode
Whether to use Turbo Mode to run experiment reloads more quickly.Filter Exposures by Qualifying Event
This setting allows filtering exposures to experimental units that did (or did not) trigger a secondary event beyond exposure. This is useful for analysis-only experiments on web experimentation platforms that over-expose heavily. For end-to-end experiments, use Statsig to expose only at the point of intervention.
This setting has a few inputs:
- A qualifying event, which Statsig joins to exposures to determine if users triggered an event
- Exclude Matching Units: if toggled, Statsig drops units that triggered this event from the analysis. If unchecked, Statsig filters the analysis to units that triggered the event.
- Use qualifying event timestamp for first exposures: if checked, Statsig replaces units' exposure timestamp with the qualifying event's timestamp. This is useful for small cohort windows, for example, measuring whether something happened within 10 minutes of the intervention. If the intervention occurred several minutes after the original exposure event, using the actual time the user saw the intervention can be helpful.
- Filter events by time window: only consider qualifying events for the inclusion/exclusion filters that occurred within a certain time from the exposure. This is useful if a user may return and re-trigger the qualifying event when it is no longer relevant to the exposure.
Filter Assignment Source
This setting controls additional filters applied to the exposure data for this experiment. This can be useful to filter out bad dates with data known to be biased or non-representative, or to filter to a specific subset of interest for the scorecard results.
Default Date Filter
Statsig calculates results across multiple rollups, including Cumulative, 7 days, and 1 day. Cumulative is almost always the correct choice for this setting because it maximizes statistical power. You can select other views in the scorecard.
Explore (custom) query settings
Explore queries enable drilldown, filtering, grouping, and more advanced ways to cut metric data by user and metric properties. They provide a way to analyze experiment results in depth. Explore queries run with the same statistics as the scorecard; with no filters or other settings applied, results match the scorecard results.
Metrics
Pick the metrics to run the explore or drilldown analysis on. This can include tags and local metrics.
Group By
This setting is the primary tool for explore and custom queries. It allows drilldowns into user segments to understand differentiated performance. You can combine it with differential impact detection to analyze experiment results beyond topline averages.By default, Statsig shows only the top 10 dimension levels with over 100 units per segment in results. The rest are grouped into an OTHER category. This prevents inflated false positive rates and keeps statistical analysis rigorous. Contact the Statsig team in Slack if you need this limit adjusted.
Filter
Use filters to include or exclude certain cohorts of units in an explore analysis by specifying the property and a filter set.
Time range for metric data
This setting filters metric data to a specific date period, for example the three days from 2024-04-10 to 2024-04-12. This is useful for investigating data anomalies, analyzing a recent cohort when results have changed significantly, or running drill-down analyses.
Filter by exposure date
In some cases it is useful to filter to units that entered the experiment on a specific date, or to exclude them (for example, units exposed on a holiday may behave atypically). You can specify filters based on user exposure date.
By switching the mode to Days Since First Exposure, you can also drill down into each cohort's behavior during a specific window after exposure. This is a useful way to understand behaviors observed in the days-since-first-exposure timeline view.
Scheduling
After running a query, you can schedule it. Scheduled queries run daily after any scheduled pulse load to update results. Scheduled queries can also be added to the experiment Summary page as part of the experiment writeup and downloadable summary.
Interaction effect detection
Explore queries can trigger interaction effect detection analyses to determine whether two experiments are interacting in a synergistic or harmful way.Was this helpful?