On this page

Frequently Asked Questions on Using Pulse

Frequently asked questions about interpreting Statsig experiment results, including p-values, confidence intervals, lift, exposures, and SRM warnings.

This page provides answers to common questions about interpreting statistical results.

I had a stat sig result, but it turned negative. How should I interpret this

Trust the current result, as it incorporates more information about the users in your experiment.

This can happen for several reasons:

  • Random noise, which is reduced as sample size grows
  • Within-week seasonality (for example, an effect that differs on Mondays), which normalizes with more data
  • The population that saw the experiment early is different from slower adopters. This is common: a daily user will likely see your experiment before someone who uses your product once a month. Use the time series view for more insight.
  • A novelty effect made the experiment appear meaningful early on, but the effect faded. For example, users might click a changed button out of curiosity, then revert to prior behavior after the novelty wears off. Use the days-since-exposure view for more insight.
Best practice for timing is to pick a readout date when you launch your experiment (based on a power analysis), and to disregard the statistical interpretation of results until then. Reading results multiple times before the readout date dramatically increases the rate of false positives.

How should I start with interpreting results

Start by using your scorecard metrics to understand whether you moved the metrics you expected to move. Bring a hypothesis about what your experiment should drive; your primary metrics should answer that hypothesis.

The delta displayed is based on the observed difference between a test and control population. The error bars visualize a confidence interval: a range of probable values for the difference between groups. A future sample's 95% confidence interval has a 95% chance of containing the true value of the difference. In practice, the CI is a representative range of what the true value might be.

These results are statistical interpretations, not facts:

  • If a result isn't stat sig, you don't have sufficient evidence to reject the null hypothesis (i.e., based on your experiment design the observed result is reasonably likely to have happened by chance).
    • Generally, treat these results as a lack of evidence for your hypothesis.
    • Underpowered tests may lead to neutral results even if a true effect exists.
  • If a result is stat sig, you have sufficient evidence to reject the null hypothesis (i.e., the probability that you would observe this result, or one more extreme, if the two groups' results were identical is below the pre-determined threshold you set).
    • Generally, treat this result as evidence for your hypothesis.
    • Multiple comparisons (many metrics, rerunning an experiment, or grouping by dimensions) greatly increase the chance of seeing a stat sig result when there's not a true effect. Be wary of interpreting results when you see those behaviors!
    • A test that was extremely unlikely to succeed and has a stat sig result has a high chance of being a false positive. Consider trying to reproduce the result, running a back-test, or reducing your significance level.

After reviewing the scorecard results, use the all-metrics tab and custom queries to gather more information. Stat-sig movements in those views aren't necessarily statistically sound: increasing the number of metrics you examine increases the chance of a false positive. Use this section to look for unexpected large regressions and to generate follow-up hypotheses.

Results aren't showing up for some metrics

This normally happens when your company uses the SDK or event imports and also imports precomputed metrics from your data warehouse. Because these can run at different times, data availability may differ. Adjust your analysis date range to get a full view of your data.

Our external source shows more exposure events than Statsig. Are data missing

Statsig doesn't count exposures on the last day (the day you made a decision). Filter out that day when you analyze your external data. The hours that define a "day" for your project depend on which timezone your project uses.

We log categorical metadata for a custom event, but Pulse doesn't show these breakouts. What's wrong

Pulse can show experimental results for sub-groups of your metric (for example, iOS vs. Android) only when you configure your metadata as a Dimension. Value Dimensions are the most common dimension type, as their metadata is logged directly with your custom events. Value dimensions must be defined in your custom event setup.

Why do I see "No dimensions available for this time range"

You may see this error when trying to view precomputed user dimensions, particularly after the first reload of the day. This happens because:

  • Dimensions load asynchronously in separate explore queries after the main scorecard results load.
  • The main experiment results appear first, while dimensions continue loading in the background.
  • Typically, dimensions become available within a few minutes after the main scorecard loads.

If you encounter this error, wait a few minutes and refresh the page to check whether the dimensions have completed loading.

Was this helpful?