Experiments Overview

Learn the fundamentals of experimentation with Statsig, including key concepts, randomization units, and statistical significance.

Experimentation is a tool for making data-driven decisions that improve product outcomes and customer experiences.

This guide covers key concepts of experimentation: control variables, randomization units, and statistical significance. It helps you understand the science behind A/B testing and multivariate experiments and how to use experiments to validate product changes, discover new opportunities, and drive business impact.

What are experiments

Experiments enable you to run randomized controlled trials (A/B or A/B/n tests) to measure the impact of product changes on key metrics. Statsig’s experimentation platform is designed to make it straightforward to create, manage, and analyze experiments, ensuring you ship features that deliver value to your users and business.

Experiments are ideal when you want to:

Test multiple variants (A/B or A/B/n) of a product feature.
Run mutually exclusive experiments in parallel.
Measure the direct impact of changes on product and business metrics.

Why experiment

Controlled experiments are the most scientifically reliable way to establish causality between your product changes and their effect on customer behavior. By running experiments, you can:

Validate Hypotheses: Only ship features that have been proven to improve the customer experience or drive key business metrics.
Measure Success: Measure feature performance post-launch and detect any unexpected side effects.
Drive Innovation: Experiments give teams real-time feedback on product performance, enabling faster iteration and better, data-driven decisions.

Historical metrics may show correlation, but experiments allow you to establish causal relationships. Experiments reduce the influence of uncontrolled external factors, ensuring that observed effects are due to the tested changes.

Key concepts

Control variables

A control variable is the variable in an experiment that is manipulated to observe its effect on key metrics. In a simple A/B test, the control variable usually has two values (A and B). More complex experiments may have additional values (e.g., A, B, C, D), known as multivariate experiments.

Variants

A variant is a specific version of the product or feature being tested. For example, in an A/B test:

A (Control): Represents the current state of the product or feature.
B (Treatment): Represents the modified state you want to evaluate.

Each variant is randomly assigned to users, allowing you to compare their performance.

Randomization unit

The randomization unit is the entity (such as a user, device, or session) that is randomly assigned to control or treatment groups in an experiment. Choosing the right randomization unit ensures consistency in user experience and reliable experiment results.

This choice is critical to ensure that experiment results reflect real-world user behavior and that data isn't skewed by unintentional crossovers between groups.

Statistical significance

Statistical significance determines whether the observed changes in metrics are likely due to the product change or random variation. Two commonly used methods are:

p-value: The p-value measures the probability of observing the results by chance if the variant had no effect. A p-value below 0.05 is typically used to determine statistical significance.
Confidence Interval: A confidence interval defines the range in which the true effect of a variant lies, with a given level of confidence (e.g., 95%). If the confidence interval doesn't overlap zero, the effect is considered statistically significant.

For more information on designing, monitoring, and analyzing experiments, refer to Product Experimentation Best Practices.

Common scenarios for experimentation

Optimize product growth

Use experiments to refine and optimize user experiences, helping you climb toward a local maximum in your product strategy. Common goals include:

Optimizing a specific user journey (e.g., improving onboarding).
Iterating on features to identify high-return opportunities.
Aligning experiments with business-critical metrics and guardrails to prevent negative side effects on fundamental business needs.

Explore new opportunities

Use exploratory experiments to discover entirely new directions. These experiments help you develop new ideas, validate strategies, and uncover long-term opportunities.

Run experiments over longer durations to account for novelty effects and adoption time.
Slowly ramp up experiments to minimize risk and build statistical power.
Test multiple related hypotheses to explore a broader business strategy.

Choosing the right randomization unit

The Randomization Unit is the variable that determines how users are distributed across your groups (for example, test and control). When you set a variable as the Randomization Unit, any value for that variable always receives the same experience. The Randomization Unit is also the reference unit for your metrics. If you choose userID, the userID deterministically buckets each user and serves as the basis of measurement: your analysis might look at Revenue per userID. The following are common units of randomization and when to use them.

User identifiers

The most commonly used randomization unit is the User ID. Your application generates this identifier when a user registers or signs in. Using User IDs ensures a consistent user experience across sessions and devices, because the user is always assigned the same variant regardless of where or when they access the product.

Advantages:

Persistent across sessions and devices.
Independent of client-side cookies, which can be cleared by users.

For more details on using User IDs with Statsig, refer to Statsig Docs on User Identifiers.

Device identifiers

For users who haven't registered or signed in, using Device IDs or Anonymous User IDs is common. These identifiers track users based on their device and are ideal when experimenting with unregistered or guest users.

Example:

You can use device IDs to experiment on landing page optimizations aimed at improving user registration rates.

Drawbacks:

Device-specific: If the same user accesses your app from multiple devices, they may have different experiences.
Shared devices: If multiple users share a device, the experiment may mistakenly treat their behavior as belonging to one individual.

Statsig SDKs automatically generate Stable IDs for anonymous users, making it easier to manage device-based experiments. For more details, refer to Statsig Guide for Device-Level Experiments.

Session identifiers

In certain cases, you may use Session IDs as the unit of randomization, particularly when testing behavior during a single session (for example, optimizing a checkout flow). Session-based randomization assumes each session is independent of others. This assumption may not hold if users return in multiple sessions.

Example:

Session IDs might be used when experimenting with conversion funnels for guest checkouts, which are typically completed within a single session.

Drawbacks:

Users may remember their experience from one session to another, undermining the assumption of session independence.
If users return in future sessions, they may be placed in different variants, leading to inconsistent user experiences.

Tutorials

Was this helpful?