Experiments Overview

Learn the fundamentals of experimentation with Statsig, including key concepts, randomization units, and statistical significance.

Statsig experimentation runs randomized controlled trials (A/B or A/B/n tests) that measure how product changes affect your key metrics. Control variables, randomization units, and statistical significance determine whether an observed result reflects the change or random variation. Use experiments to validate product changes, discover new opportunities, and confirm causal impact before you ship.

What are experiments

Experiments enable you to run randomized controlled trials (A/B or A/B/n tests) to measure the impact of product changes on key metrics. Statsig’s experimentation platform helps you create, manage, and analyze experiments, ensuring you ship features that deliver value to your users and business.

Experiments are ideal when you want to:

Test multiple variants (A/B or A/B/n) of a product feature.
Run mutually exclusive experiments in parallel.
Measure the direct impact of changes on product and business metrics.

Why experiment

Controlled experiments are the most scientifically reliable way to establish causality between your product changes and their effect on customer behavior. By running experiments, you can:

Validate Hypotheses: Only ship features after experiments prove they improve the customer experience or drive key business metrics.
Measure Success: Measure feature performance post-launch and detect any unexpected side effects.
Drive Innovation: Experiments give teams real-time feedback on product performance, enabling faster iteration and better, data-driven decisions.

Historical metrics may show correlation, but experiments allow you to establish causal relationships. Experiments reduce the influence of uncontrolled external factors, ensuring that observed effects are due to the tested changes.

Key concepts

Control variables

A control variable is the variable in an experiment that you manipulate to observe its effect on key metrics. In a simple A/B test, the control variable usually has two values (A and B). More complex experiments may have additional values (e.g., A, B, C, D), known as multivariate experiments.

Variants

A variant is a specific version of the product or feature you're testing. For example, in an A/B test:

A (Control): Represents the current state of the product or feature.
B (Treatment): Represents the modified state you want to evaluate.

Statsig randomly assigns each variant to users, allowing you to compare their performance.

Randomization unit

The randomization unit is the entity (such as a user, device, or session) that Statsig randomly assigns to control or treatment groups in an experiment. Choosing the right randomization unit ensures consistency in user experience and reliable experiment results.

This choice is critical to ensure that experiment results reflect real-world user behavior and that unintentional crossovers between groups don't skew the data.

Statistical significance

Statistical significance determines whether the observed changes in metrics are likely due to the product change or random variation. Two commonly used methods are:

p-value: The p-value measures the probability of observing the results by chance if the variant had no effect. A p-value below 0.05 typically indicates statistical significance.
Confidence Interval: A confidence interval defines the range in which the true effect of a variant lies, with a given level of confidence (e.g., 95%). If the confidence interval doesn't overlap zero, the effect is statistically significant.

For more information on designing, monitoring, and analyzing experiments, refer to Product Experimentation Best Practices.

Common scenarios for experimentation

Optimize product growth

Use experiments to refine and optimize user experiences, helping you climb toward a local maximum in your product strategy. Common goals include:

Optimizing a specific user journey (e.g., improving onboarding).
Iterating on features to identify high-return opportunities.
Aligning experiments with business-critical metrics and guardrails to prevent negative side effects on fundamental business needs.

Explore new opportunities

Use exploratory experiments to discover entirely new directions. These experiments help you develop new ideas, validate strategies, and uncover long-term opportunities.

Run experiments over longer durations to account for novelty effects and adoption time.
Slowly ramp up experiments to minimize risk and build statistical power.
Test multiple related hypotheses to explore a broader business strategy.

Choosing the right randomization unit

The Randomization Unit is the variable that determines how Statsig distributes users across your groups (for example, test and control). When you set a variable as the Randomization Unit, any value for that variable always receives the same experience. The Randomization Unit is also the reference unit for your metrics. If you choose userID, the userID deterministically buckets each user and serves as the basis of measurement: your analysis might look at Revenue per userID. The following are common units of randomization and when to use them.

User identifiers

The most commonly used randomization unit is the User ID. Your application generates this identifier when a user registers or signs in. Using User IDs ensures a consistent user experience across sessions and devices, because Statsig always assigns the user the same variant regardless of where or when they access the product.

Advantages:

Persistent across sessions and devices.
Independent of client-side cookies, which users can clear.

For more details on using User IDs with Statsig, refer to Statsig Docs on User Identifiers.

Device identifiers

For users who haven't registered or signed in, using Device IDs or Anonymous User IDs is common. These identifiers track users based on their device and are ideal when experimenting with unregistered or guest users.

Example:

You can use device IDs to experiment on landing page optimizations aimed at improving user registration rates.

Drawbacks:

Device-specific: If the same user accesses your app from multiple devices, they may have different experiences.
Shared devices: If multiple users share a device, the experiment may mistakenly treat their behavior as belonging to one individual.

Statsig SDKs automatically generate Stable IDs for anonymous users, making it easier to manage device-based experiments. For more details, refer to Statsig Guide for Device-Level Experiments.

Session identifiers

In certain cases, you may use Session IDs as the unit of randomization, particularly when testing behavior during a single session (for example, optimizing a checkout flow). Session-based randomization assumes each session is independent of others. This assumption may not hold if users return in multiple sessions.

Example:

You might use Session IDs when experimenting with conversion funnels for guest checkouts, which users typically complete within a single session.

Drawbacks:

Users may remember their experience from one session to another, undermining the assumption of session independence.
If users return in future sessions, Statsig may place them in different variants, leading to inconsistent user experiences.

Experiment detail tabs

When you open an experiment, its detail page organizes configuration and analysis into five tabs:

Setup: Configure the experiment's scorecard, allocation, targeting, groups, and parameters. Refer to Create an Experiment.
Diagnostics: View a live log stream of checks and events from your application to confirm your integration works as expected.
Results: Read exposures and Scorecard metric lifts. Refer to How to Read Experiment Results.
Explore: Build custom queries to break down results by user or event dimensions.
Summary: Review a high-level summary of the experiment.

Tutorials

Was this helpful?