Understanding Experiments
Understand how experiments are structured and run in Statsig Warehouse Native, including assignment, exposures, metric sources, and result analysis.
Running statistical analysis for an experiment can be complex. Statsig abstracts much of this complexity, which commonly leads to questions like the following:
- What do different date ranges mean?
- What metric data is included in the calculation?
This page gives an overview of the basic settings in Statsig and what they mean for the analysis that runs.
For more advanced settings, refer to experiment configurationDate ranges
The date range of an experiment controls the filters used when querying both exposure and metric data. For example, an experiment with a start date of 2025-02-01 and an end date of 2025-02-14 queries assignment data between those two dates, and also queries metric data between those two dates.
There are a few exceptions:
- Metric Bake Windows or Cohort Windows with
wait for...enabled: cohorts that haven't been in the experiment for the duration of their cohort/bake window are excluded from both the exposure count and metric data for the metric in question. - If
allow cohort metrics to bake after experiment endis enabled: the end date of analysis is extended artificially (only for cohort metrics) to maximize data collection for experiments with a one-time intervention where the experiment effect is expected to persist after the treatment is turned off.
Exposure timing
Statsig includes a given experimental unit’s (for example, a user’s) data only after that unit triggers the experiment. For example, if you are testing a new notification, metric data from before a given user sees the notification shouldn't be included in the experimental analysis.
There is an exception: if a metric source uses a date column as its timestamp, Statsig doesn't include metric data from the day of exposure in the calculation. This happens because a timestamp cast from a date defaults to the first second of that date (00:00:00), which falls before the exposure time.
To address this, the metric source has a setting called Treat timestamp as date:
If checked: Statsig includes day-0 data by converting the join condition to use the cast date (ignoring time).
If unchecked: Statsig excludes day-0 data from the join between exposure and metric for the reason described above.
Statsig recommends including day-0 data (checking measure timestamp as day) if data has been preprocessed at a daily grain.
Explore query dates
Explore queries can filter metric data or assignment data independently. The default filtering option is metric data. This allows you to exclude certain dates with buggy or non-representative data, or to scope an analysis to recent periods (for example, if your hypothesis is that the treatment effect will take a long time to develop).
Under the advanced tab, it is also possible to:
- filter to exposures in a date range (or outside of)
- filter to exposures within a certain cohort (e.g. X-Y days since exposure)
Selecting dates
Regardless of turbo mode when using incremental reloads, the date picker next to pulse can be used to view pulse "as of" a certain date. This is useful for reviewing historical discussion of an experiment or understanding how results have evolved over time. This data can also be viewed through the time series views in Statsig.Do not use this feature to cherry-pick dates with favorable results.
How experiment calculations work
For every experiment analysis, the basic flow is:
- Identifying when units first saw or were enrolled into the experiment.
- Identifying what those units did (metric data: events or other rollups) after seeing the experimental variant they were assigned to, up until the end of the experiment. This results in a tagged dataset where metric data is associated with a group.
- Aggregating that metric data over the experiment duration. Go to the metrics documentation for descriptions and SQL snippets describing this step.
- Calculating group-level statistics to be used in the final scorecard analysis. At a high level, the analysis requires the observed totals/means per group, and the variance, which is used to understand how meaningful that difference is (typically statistical significance in frequentist analysis, or probability of best in Bayesian analysis).
- Means are normalized per unit. For basic aggregations this imputes 0s: the calculation is the total value over the count of units exposed. For ratios and means, Statsig computes the means as the sum of the numerator over the sum of the denominator.
Was this helpful?