Pulse FAQs

Frequently asked questions about interpreting Statsig Warehouse Native experiment results, including p-values, intervals, exposures, and SRM warnings.

Interpreting statistical results can be difficult, and many users have similar questions. This page covers the most common ones.

I had a stat sig result, but it turned negative. How should I interpret this

In general, trust the current result, because it incorporates more information about the users in your experiment.

There are several reasons this can happen:

Random noise, which gets diluted as your sample size gets larger
Within-week seasonality (e.g. an effect is different on Mondays), which gets normalized with more data
The users who saw the experiment early are different from slower adopters. A daily user likely sees your experiment before someone who uses your product once a month. Use the time series view for more insight on these adoption differences.
A novelty effect made the experiment meaningful early on, but the effect faded. For example, after changing a button, users may click it out of curiosity at first, then revert to prior behavior. Use the days-since-exposure view for more insight on novelty effects.

Best practice is to set a readout date when you launch your experiment, based on a power analysis, and to disregard the statistical interpretation of results until that date. Reading results multiple times before the readout date dramatically increases the rate of false positives.

How should I start interpreting results

Start by using your scorecard metrics to understand whether you moved the metrics you expected to move. Before reviewing Pulse, form a hypothesis about what your experiment should drive. Your primary metrics should answer that hypothesis.

The delta displayed is based on the observed difference between test and control populations. The error bars visualize a confidence interval. A confidence interval is a range of probable values for the difference between groups. A future sample's 95% confidence interval contains the true value 95% of the time. In practice, the CI is a representative range of what the true value might be.

These results are statistical interpretations, not facts:

If a result isn't stat sig, you don't have sufficient evidence to reject the null hypothesis. Based on your experiment design, the observed result is reasonably likely to have happened by chance.
- Generally, you should treat these results as a lack of evidence for your hypothesis.
- Underpowered tests may lead to neutral results even if a true effect exists.
If a result is stat sig, you have sufficient evidence to reject the null hypothesis. In other words, if the two groups' results were identical, the probability of observing this result (or one more extreme) is below the predetermined threshold you set.
- Generally, you should treat this result as evidence for your hypothesis.
- Multiple comparisons (many metrics, rerunning an experiment, or grouping by dimensions) greatly increase the chance of seeing a stat sig result when there's not a true effect. Be wary of interpreting results when you see those behaviors!
- A test that was extremely unlikely to succeed (such as a moonshot) with a stat sig result has a high chance of being a false positive. This is a strong signal, but consider reproducing the result, running a back-test, or reducing your significance level.

After reviewing the scorecard results, use the all-metrics tab and custom queries for more information about your experiment. Examining more metrics increases the chance of a false positive, so a statistically significant movement in those views isn't necessarily a statistically sound interpretation. Use that section to look for unexpected large regressions and to generate follow-up hypotheses.

Results are missing for some metrics

Missing results typically happen when your organization uses both the SDK or event imports and precomputed metrics imported from your data warehouse. Because these pipelines can run at different times, data availability may differ. Adjust your analysis date range to get a full view of your data.

Your external source shows more exposure events than Statsig. Are data missing

Statsig doesn't count exposures on the last day (the day you made a decision). Filter out that day when you analyze your external data. The hours that define a "day" for your project depend on the timezone you assigned to your project.

You log categorical metadata for a custom event, but Pulse doesn't show these breakouts. What's wrong

Pulse shows experimental results for metric sub-groups (for example, iOS vs. Android) only when you configure your metadata as a Dimension. Value Dimensions are the most common dimension type because their metadata is logged directly with your custom events. Define value dimensions in your custom event setup.

Why do I see "No dimensions available for this time range"

This error appears when you try to view precomputed user dimensions, particularly after the first reload of the day. The error occurs because:

Statsig loads dimensions asynchronously in separate explore queries after the main scorecard results load.
The main experiment results appear first, while dimensions continue loading in the background.
Dimensions are typically available within a few minutes after the main scorecard loads.

If you encounter this error, wait a few minutes and refresh the page to check whether the dimensions have finished loading.

Was this helpful?