Understanding Experiments

Understand how experiments are structured and run in Statsig Warehouse Native, including assignment, exposures, metric sources, and result analysis.

Statsig Warehouse Native structures an experiment around date ranges and exposure timing, then turns each unit's exposures and metric data into scorecard results. Statsig abstracts much of the statistical complexity of experiment analysis, which commonly leads to questions like the following:

What do different date ranges mean?
What metric data is included in the calculation?

This page gives an overview of the basic settings in Statsig and what they mean for the analysis that runs.

For more advanced settings, refer to experiment configuration

Date ranges

The date range of an experiment controls the filters used when querying both exposure and metric data. For example, an experiment with a start date of 2025-02-01 and an end date of 2025-02-14 queries both assignment data and metric data between those two dates.

There are a few exceptions:

Metric Bake Windows or Cohort Windows with wait for... enabled: Statsig excludes cohorts that haven't been in the experiment for the duration of their cohort/bake window. Statsig excludes these cohorts from both the exposure count and the metric data for the metric in question.
If allow cohort metrics to bake after experiment end is enabled: Statsig extends the end date of analysis artificially (only for cohort metrics) to maximize data collection. This extension applies to experiments with a one-time intervention where you expect the experiment effect to persist after you turn off the treatment.

Exposure timing

Statsig includes a given experimental unit’s (for example, a user’s) data only after that unit triggers the experiment. For example, if you're testing a new notification, Statsig shouldn't include metric data from before a given user sees the notification in the experimental analysis.

There is an exception: if a metric source uses a date column as its timestamp, Statsig doesn't include metric data from the day of exposure in the calculation. This happens because a timestamp cast from a date defaults to the first second of that date (00:00:00), which falls before the exposure time.

To address this, the metric source has a setting called Treat timestamp as date:

If checked: Statsig includes day-0 data by converting the join condition to use the cast date (ignoring time).
If unchecked: Statsig excludes day-0 data from the join between exposure and metric for the reason described above.

Statsig recommends including day-0 data (checking measure timestamp as day) if you've preprocessed the data at a daily grain.

Explore query dates

Explore queries can filter metric data or assignment data independently. The default filtering option is metric data. This filtering allows you to exclude certain dates with buggy or non-representative data, or to scope an analysis to recent periods. For example, you might scope to recent periods if your hypothesis is that the treatment effect takes a long time to develop.

Under the advanced tab, it's also possible to:

filter to exposures in a date range (or outside of)
filter to exposures within a certain cohort (e.g. X-Y days since exposure)

Selecting dates

Regardless of turbo mode when using incremental reloads, you can use the date picker next to pulse to view pulse "as of" a certain date. This is useful for reviewing historical discussion of an experiment or understanding how results have evolved over time. You can also view this data through the time series views in Statsig.

Don't use this feature to cherry-pick dates with favorable results.

How experiment calculations work

For every experiment analysis, the basic flow is:

Identifying when units first saw or enrolled into the experiment.
Identifying what those units did (metric data: events or other rollups) after seeing the experimental variant Statsig assigned them to, up until the end of the experiment. This results in a tagged dataset where Statsig associates metric data with a group.
Aggregating that metric data over the experiment duration. Go to the metrics documentation for descriptions and SQL snippets describing this step.
Calculating group-level statistics for use in the final scorecard analysis. At a high level, the analysis requires the observed totals/means per group, and the variance, which Statsig uses to understand how meaningful that difference is. This measure of meaningfulness is typically statistical significance in frequentist analysis, or probability of best in Bayesian analysis.
- Statsig normalizes means per unit. For basic aggregations this imputes 0s: the calculation is the total value over the count of units exposed. For ratios and means, Statsig computes the means as the sum of the numerator over the sum of the denominator.

Was this helpful?