Pulse FAQs
Frequently asked questions about interpreting Statsig Warehouse Native experiment results, including p-values, intervals, exposures, and SRM warnings.
Interpreting statistical results can be difficult, and many users have similar questions. This page covers the most common ones.
I had a stat sig result, but it turned negative. How should I interpret this
In general, trust the current result, because it incorporates more information about the users in your experiment.
There are several reasons this can happen:
- Random noise, which gets diluted as your sample size gets larger
- Within-week seasonality (e.g. an effect is different on Mondays), which gets normalized with more data
- The users who saw the experiment early are different from slower adopters. A daily user will likely see your experiment before someone who uses your product once a month. Use the time series view for more insight on this.
- A novelty effect made the experiment meaningful early on, but the effect faded. For example, after changing a button, users may click it out of curiosity at first, then revert to prior behavior. Use the days-since-exposure view for more insight on this.
How should I start interpreting results
Start by using your scorecard metrics to understand whether you moved the metrics you expected to move. Before reviewing Pulse, form a hypothesis about what your experiment should drive. Your primary metrics should answer that hypothesis.
The delta displayed is based on the observed difference between test and control populations. The error bars visualize a confidence interval. A confidence interval is a range of probable values for the difference between groups. A future sample's 95% confidence interval contains the true value 95% of the time. In practice, the CI is a representative range of what the true value might be.
These results are statistical interpretations, not facts:
- If a result is not stat sig, this means you don't have sufficient evidence to reject the null hypothesis (i.e., based on your experiment design the observed result is reasonably likely to have happened by chance).
- Generally, you should treat these results as a lack of evidence for your hypothesis
- Underpowered tests may lead to neutral results even if a true effect exists
- If a result is stat sig, this means that you have sufficient evidence to reject the null hypothesis (i.e., the probability that you would observe this result, or one more extreme, if the two groups' results were identical is below the pre-determined threshold you set).
- Generally, you should treat this result as evidence for your hypothesis
- Multiple comparisons (many metrics, rerunning an experiment, or grouping by dimensions) greatly increase the chance of seeing a stat sig result when there's not a true effect. Be wary of interpreting results when you see those behaviors!
- A test that was extremely unlikely to succeed (such as a moonshot) with a stat sig result has a high chance of being a false positive. This is a strong signal, but consider reproducing the result, running a back-test, or reducing your significance level.
After reviewing the scorecard results, use the all-metrics tab and custom queries for more information about your experiment. Examining more metrics increases the chance of a false positive, so a statistically significant movement in those views isn't necessarily a statistically sound interpretation. Use that section to look for unexpected large regressions and to generate follow-up hypotheses.
Results are missing for some metrics
This typically happens when your organization uses both the SDK or event imports and precomputed metrics imported from your data warehouse. Because these pipelines can run at different times, data availability may differ. Adjust your analysis date range to get a full view of your data.
Your external source shows more exposure events than Statsig. Are data missing
Statsig doesn't count exposures on the last day (the day you made a decision). Filter out that day when you analyze your external data. The hours that define a "day" for your project depend on the timezone you assigned to your project.
You log categorical metadata for a custom event, but Pulse doesn't show these breakouts. What's wrong
Pulse shows experimental results for metric sub-groups (for example, iOS vs. Android) only when you configure your metadata as a Dimension. Value Dimensions are the most common dimension type because their metadata is logged directly with your custom events. Define value dimensions in your custom event setup.Why do I see "No dimensions available for this time range"
This error appears when you try to view precomputed user dimensions, particularly after the first reload of the day. This happens because:
- Statsig loads dimensions asynchronously in separate explore queries after the main scorecard results load.
- The main experiment results appear first, while dimensions continue loading in the background.
- Dimensions are typically available within a few minutes after the main scorecard loads.
If you encounter this error, wait a few minutes and refresh the page to check whether the dimensions have finished loading.
Was this helpful?