Metric Deltas

How Statsig Warehouse Native computes metric deltas to compare absolute and relative differences between experiment groups in scorecards.

Computing metric deltas

A metric delta refers to the difference in metric values between two groups, by default the test and control groups. The metric delta is the impact you measure when evaluating experiment results. To account for the different number of users (or units) in each group, Statsig compares the mean metric value per user, not the total.

Selecting groups

Statsig defines all deltas as the difference between a "treatment" group and a presumably unchanged "control" group. However, Statsig allows comparison between any two groups.

Pulse provides two different metric deltas. The absolute delta is the difference between the two means:

\Delta \overline{X}=\overline{X}_t-\overline{X}_c

Understanding the impact relative to the baseline value of the metric is often helpful. For example, an absolute delta of +1 clicks/user has different meanings with a baseline value of 1 (+100% increase) vs. a baseline value of 100 (+1% increase). Statsig computes the relative delta using the mean of the control group as the baseline:

\Delta \overline{X} \%=\frac{\overline{X}_t-\overline{X}_c}{\overline{X}_c} \times 100 \%

If you reverse the order of group comparison in Pulse to be "control" vs "treatment", all deltas reverse and the direction of change inverts.

Computing means

Computing group means correctly is critical for obtaining meaningful metric deltas. The methodology for calculating metric means depends on the metric type.

Event count and sum metrics

These metrics represent totals: number of times an event occurs, sum of time spent, total purchase amount, and similar values. The mean is the average user-level total during the analysis period.

The mean value of the metric $X$ for a group is:

\overline{X}=\frac{1}{N} \sum_{i=0}^N \sum_{d=0}^{n_i} X_{i, d}

where:

$N$ is the number of users in the group
$n_i$ is the number of days during the analysis period that user $i$ was the experiment
$X_{i,d}$ is the metric value for user $i$ on day $d$

Statsig includes only user metrics from after a user's exposure to the experiment in the group mean.

User accounting and Event User metrics (and legacy Event DAU)

Event User metrics set to "Daily Participation Rate" capture the number of distinct users that have the event each day. In Pulse results, Statsig normalizes these values by the number of days the user is in the experiment. The normalized value represents the probability that a user is daily active for that event (the daily participation rate). The group mean is:

\overline{X}=\frac{1}{N} \sum_{i=0}^N \frac{1}{n_i} \sum_{d=0}^{n_i} X_{i, d}

where:

$X_{i,d}$ takes value 0 or 1 depending on if user $i$ has the event on a given day $d$ .

Statsig computes the following user accounting metrics the same way: DAU, WAU, MAU_28day, L7, L14, L28

For new user accounting (new_DAU, new_WAU, new_MAU_28day), Statsig counts users that are new xAU at some point during the analysis window. The group mean is:

\overline{X}=\frac{1}{N} \sum_{i=0}^N \max \left(X_i\right)

Where $\max(X_i)$ is the maximum value of the new xAU metric for user $i$ .

event_dau metrics are now in legacy support only and Statsig no longer creates them for new events. Existing event_dau metrics remain available for any of your new experiments, and Statsig continues to compute them daily. For all new events, create an event_user metric to measure daily active users.

Custom ratios, means, retention, and stickiness metrics

These metrics include click-through rate, average purchase value, sessions per user, and similar values. Statsig obtains each by dividing a numerator value, $X$ , by a denominator value, $Y$ . The mean value of a ratio metric $R$ for an experiment group is:

\overline{R}=\frac{\frac{1}{N} \sum_{i=0}^N \sum_{d=0}^{n_i} X_{i, d}}{\frac{1}{N} \sum_{i=0}^N \sum_{d=0}^{n_i} Y_{i, d}}=\frac{\overline{X}}{\overline{Y}}

Where $N$ is the number of users in the experiment group that participate in the metric, i.e. have a non-zero denominator value. $X_{i,d}$ and $Y_{i,d}$ are the $X$ and $Y$ values for user $i$ on day $d$ .

Different approaches exist for ratio metrics in experiments. Statsig selected this implementation because it's statistically sound and interpretable:

$R$ is the ratio of two means of independent observations: a set of user-level $X$ values and a set of user-level $Y$ values. You can therefore use the central limit theorem to separately obtain the summary statistics of $X$ and $Y$ .
Statsig computes the group means in the same way as the topline metric value, making the means easier to interpret and relate to the topline metric.

Event User one-time event

For custom event_user metrics with "One-Time Event" selected, Statsig computes how many users have the event at any time after entering the experiment. Statsig doesn't normalize this result by the number of days a user is in the experiment. The group mean is:

\overline{X}=\frac{1}{N} \sum_{i=0}^N X_{i}

where:

$N$ is the number of users in the group
$X_{i}$ takes value 0 or 1 depending on if user $i$ has the event at any point after entering the experiment

Was this helpful?