On this page

Running Analysis Across Unit Types, AKA Cluster Experiments

Learn how to run analysis when the experiment assignment unit differs from the analysis unit.

There are two common scenarios where the experiment assignment unit differs from the analysis unit:

  1. Measuring session-level metrics for a user-level experiment. Ratio metrics are commonly used to solve this (covered on this page).
  2. Measuring logged-in metrics (eg. revenue) on a logged-out experiment. There are two solutions: a. Running the experiment at the device-level, with device-level metrics collected even after the user is logged-in. b. Using ID resolution.

This page explains how to set up the first scenario using Warehouse Native.

Workflow diagram for analyzing metrics at different ID levels

Example: Organizations and Users

Scenario:

  • Your metrics source has both org_id and user_id.
  • The relationship between org_id and user_id is 1-to-many. A single org_id can be associated with multiple users (user_id), but a user_id is only associated with a single org_id.
  • Statsig assigns your experiment at the org_id level.
  • You are interested in understanding the treatment effect at the user_id level, such as revenue per user.

1. Set up the metric source with org_id as an ID type

  • In this table, each row of data should have both org_id and user_id.

Metric source table setup with org_id and user_id fields

2. Choose your assignment source, where the unit of assignment is org_id.

Assignment source configuration selecting org_id unit

3. Define your metric of revenue per user_id

  • Your denominator should be count distinct user_id instead of unit count, because the latter is equivalent to count distinct org_id in an org_id level experiment.

Metric definition screenshot showing revenue per user_id formula

4. Set up the experiment with org_id

Experiment setup specifying org_id as unit type

How the Stats Engine handles cluster experiments

The Stats Engine uses the delta method to calculate variance and confidence intervals.

  • For mean metrics, the Stats Engine records the number of observations per exposed unit in the records column of the staging data. This value acts as the denominator or cluster-size value for delta calculations.
  • For general ratio metrics, the Stats Engine tracks the two-component metrics (the ratio and the denominator) as independent metrics, then combines them during the pulse analysis to derive a single metric.
For more information about the delta method, go to Statsig - Delta Method Methodology. The delta method accounts for the covariance between the numerator and the denominator (more users per org is correlated with more revenue). Refer to section 3 of this paper for details.

This approach is also relevant for analyzing event-level outcomes, such as average purchase value, where randomization occurs at the user level, and each user may experience multiple session events.

Was this helpful?