Running Analysis Across Unit Types, AKA Cluster Experiments

Learn how to run analysis when the experiment assignment unit differs from the analysis unit.

There are two common scenarios where the experiment assignment unit differs from the analysis unit:

Measuring session-level metrics for a user-level experiment. Ratio metrics commonly solve this scenario (covered on this page).
Measuring logged-in metrics (eg. revenue) on a logged-out experiment. There are two solutions: a. Running the experiment at the device-level, with device-level metrics collected even after the user is logged-in. b. Using ID resolution.

This page explains how to set up the first scenario using Warehouse Native.

Workflow diagram for analyzing metrics at different ID levels

Example: Organizations and users

Scenario:

Your metrics source has both org_id and user_id.
The relationship between org_id and user_id is 1-to-many. A single org_id can map to multiple users (user_id), but a user_id maps to only a single org_id.
Statsig assigns your experiment at the org_id level.
You're interested in understanding the treatment effect at the user_id level, such as revenue per user.

1. Set up the metric source with `org_id` as an ID type

In this table, each row of data should have both org_id and user_id.

Metric source table setup with org_id and user_id fields

2. Choose your assignment source, where the unit of assignment is `org_id`

Assignment source configuration selecting org_id unit

3. Define your metric of revenue per `user_id`

Your denominator should be count distinct user_id instead of unit count, because unit count is equivalent to count distinct org_id in an org_id level experiment.

Metric definition screenshot showing revenue per user_id formula

4. Set up the experiment with `org_id`

Experiment setup specifying org_id as unit type

How the Stats Engine handles cluster experiments

The Stats Engine uses the delta method to calculate variance and confidence intervals.

For mean metrics, the Stats Engine records the number of observations per exposed unit in the records column of the staging data. This value acts as the denominator or cluster-size value for delta calculations.
For general ratio metrics, the Stats Engine tracks the two-component metrics (the ratio and the denominator) as independent metrics, then combines them during the pulse analysis to derive a single metric.

For more information about the delta method, go to Statsig - Delta Method Methodology. The delta method accounts for the covariance between the numerator and the denominator (more users per org correlates with more revenue). Refer to section 3 of this paper for details.

The delta method also applies when analyzing event-level outcomes, such as average purchase value, where randomization occurs at the user level, and each user may experience multiple session events.

Was this helpful?

Running Analysis Across Unit Types, AKA Cluster Experiments

Example: Organizations and users

1. Set up the metric source with org_id as an ID type

2. Choose your assignment source, where the unit of assignment is org_id

3. Define your metric of revenue per user_id

4. Set up the experiment with org_id

How the Stats Engine handles cluster experiments

1. Set up the metric source with `org_id` as an ID type

2. Choose your assignment source, where the unit of assignment is `org_id`

3. Define your metric of revenue per `user_id`

4. Set up the experiment with `org_id`