Skip to main content

Data Sources

Statsig Warehouse Native can read from a variety of source queries. These do need to conform to an interface, but in general are very flexible.

Metric Sources

All Statsig needs to create metrics is a timestamp or date, and a unit (or user) identifier. Context fields let you pull multiple metrics from the same base query, and select values to sum, mean, or group by.

Column TypeDescriptionFormat/Rules
timestampRequired an identifier of when the metric data occurredCastable to Timestamp/Date
unit identifierRequired At least one entity to which this metric belongsGenerally a user ID or similar
additional identifiersOptional Entity identifiers for reuse across identifier types
context columnsOptional Fields which will be aggregated, filtered, or grouped on

For example, you could pull from event logging and aggregate the event-level data to create metrics:

timestampuser_idcompany_ideventtime_to_loadpage_route
2023-10-10 00:01:01my_user_17503c_22235455page_load207.22/
2023-10-10 00:02:15my_user_18821c_22235455page_load522.38/search
2023-10-10 00:02:22my_user_18821c_22235455serp_clicknull/search

You could create an average TTL metric by averaging time_to_load, and group it by page route or filter to specific routes when creating your metric.

As another example, you might pre-calculate some metrics yourself at a user-day grain - either to match your source-of-truth exactly or to add more complex logical fields:

timestampuser_idcompany_idcountrypage_loadssatisfaction_scorerevenue_usdnet_revenue_usd
2023-10-10my_user_17503c_22235455US139130.21112.33
2023-10-10my_user_18821c_22235455CA1200
2023-10-10my_user_18828c_190887DE0null22.10

You can create different metrics by summing and filtering on those daily fields.

Assignment Sources

For experiment assignment sources, Statsig requires information on who was exposed, when, and to what experiment:

Column TypeDescriptionFormat/Rules
timestampRequired an identifier of when the experiment exposure occurredCastable to Timestamp/Date
unit identifierRequired at least one entity to which this metric belongsGenerally a user ID or similar
experiment identifierRequired the experiment the exposure was forUsually an experiment name
group identifierRequired the experimental variant the user was assigned toUsually a group name
additional identifiersOptional Entity identifiers for reuse across identifier types
context columnsOptional Fields which can be used to group by and filter results in exploratory queries

For example, you could pull from exposure event logging directly:

timestampuser_idcompany_idexperiment_namegroup_namecountry
2023-10-10 00:01:01my_user_17503c_22235455ranking_v1_vs_v2v1US
2023-10-10 00:02:15my_user_18821c_22235455ranking_v1_vs_v2v2CA
2023-10-10 00:02:22my_user_18821c_22235455search UI revampcontrolCA

Qualifying Events

Qualifying events are used to simulate exposures for power analysis. They are similar to exposures, except they do not require experimental information. Context columns can be used to filter the qualifying event for power analysis - for example you might have a Qualifying Event for page load, and filter to different page identifiers for power analyses of experiments on different surfaces.

Column TypeDescriptionFormat/Rules
timestampRequired an identifier of when the qualifying event occurredCastable to Timestamp/Date
unit identifierRequired At least one entity to which this metric belongsGenerally a user ID or similar
additional identifiersOptional Entity identifiers for reuse across identifier types
context columnsOptional Fields which can be used to group by and filter results in exploratory queries

For example, you could pull from page load event logging directly:

timestampuser_idcompany_idpage_route
2023-10-10 00:01:01my_user_17503c_22235455/
2023-10-10 00:02:15my_user_18821c_22235455/search
2023-10-10 00:03:12my_user_22251c_9928/profile

Entity Property Sources

For property sources, Statsig only needs a user_id and property fields. Property sources can define fixed properties (e.g. a users Country of origin), but can also define dynamic in which case you need to provide a timestamp for Statsig to identify the most recent pre-exposure record.

Column TypeDescriptionFormat/Rules
timestampOptional an identifier of when the property was defined. Required for dynamic propertiesCastable to Timestamp/Date
unit identifierRequired At least one entity to which this metric belongsGenerally a user ID or similar
property columnsRequired Fields which can be used to group by and filter results in exploratory queries

For example, a static property source could just be:

user_idcompany_idcountry
my_user_17503c_22235455US
my_user_18821c_22235455CA

Which could be used to filter and group by any experiment that was exposed one either user_id or company_id.

For a dynamic property, it might look like this:

user_idtimestampcompany_idintent_segmentspend_segment
my_user_175032023-10-10c_22235455high_intenthigh
my_user_175032023-10-11c_22235455high_intenthigh
my_user_175032023-10-12c_22235455mid_intenthigh
my_user_188212023-10-10c_22235455low_intentlow
my_user_188212023-10-11c_22235455low_intentmid
my_user_188212023-10-12c_22235455low_intentmid

The first user in this example has their intent_segment property change on 2023-10-12; based on what the intent_segment was prior to their exposure, they might have different intent_segment values for different experiment analyses.