Skip to main content

Data Mapping

Overview

Statsig requires certain data schema in order for proper processing. We support 3 different types of datasets to be ingested into our platform:

  1. Custom Events
  2. Precomputed Metrics
  3. Exposure Events

During setup, we will ask you to map columns in your data output to fields Statsig expects and run a small sample query to make sure that there aren't any basic issues with data types, the mapping, or the base query itself.

Please note that we will cast fields into the appropriate type. For example, Statsig accepts String IDs, but it's okay to leave an ID field as an integer.


Custom Events

Events that are emitted by your application to measure the ongoing impact of your features and experiments.

Required

ColumnDescriptionFormat/Rules
timestampThe unix time your event was logged atBIGINT. please cast timestamps into epoch time in seconds
event_nameThe name of the eventSTRING/VARCHAR. Not null. Length < 128 characters
unit_idUnique unit identifierSTRING/VARCHAR. User ID, Stable ID, etc. The same event row can have multiple IDs

Optional

`

ColumnDescriptionRules
event_valueThe value of the eventSTRING/VARCHAR. Length < 128 characters. Statsig will detect numeric values
event_metadataMetadata columns about the event(MANY) STRING:STRING. Statsig will generate a metadata json field from however many metadata columns you provide
metadata_jsonMetadata json about the eventJSON STRING. Statsig will unpack this json 1 level deep. Nested values will be stored as strings

An example dataset for events might look like this:

unit_idvisit_ideventtimestampvaluemetadata_blobuser_type
331444click1676484875{"click_target": "exit_details_button"}power_user
331444click1676484860{"click_target": "open_details_button"}power_user
265113click1676484333{"click_target": "button", "button_color": "green"}churn_risk_user
445332aeeer43dvisit1676483821{"page": "landing_page"}new_user
224448checkout167648222233.22{"product_id": "11eefj", "product_category": "clothing"}power_user

Note that:

  • One user can send multiple of the same event, with or without any changes in metadata. Statsig will aggregate these together.
  • You can send metadata in both of a json-formatted (only one-level deep) string, and/or pull in fields from columns. You can use metadata and values to generate custom metrics in the console, like sum(value) where "product_category"="clothing".
  • You can send multiple IDs on a single event. For example, the visit above would could for both user and visit level metrics/experiments. During the mapping flow you tell us which unit types your different IDs correspond to in statsig.


Precomputed Metrics

Precomputed metrics are a powerful way to leverage statsig for experiment results. Use these to send complex metrics and metrics that require delays due to attribution windows or long baking periods.

Precomputed metrics in statsig are expected to be calculated at a user-day granularity.

Required

ColumnDescriptionFormat/Rules
unit_idThe unique user identifier this metric is for. This might not necessarily be a user_id - it could be a custom_id of some kindSTRING
id_typeThe id_type the unit_id represents.STRING. A valid ID type from your project
dateDate of the daily metricDATE or ISOFORMATTED STRING. The date of the metric value for the unit_id provided
metric_nameThe name of the metricSTRING (Not null). Length < 128 characters
metric_valueA numeric value for the metricDOUBLE/NUMERIC. Metric value
OR
Both of numerator/denominator need to be provided for Statsig to process the metric.
numeratorNumerator for metric calculationDOUBLE/NUMERIC. If present along with a denominator in any record, the metric will be treated as ratio and only calculated for users with non-null denominators.
denominatorDenominator for metric calculationDOUBLE/NUMERIC. If present along with a numerator in any record, the metric will be treated as ratio and only calculated for users with non-null denominators.

An example dataset for metrics might look like this:

unit_idunit_typedatemetric_namemetric_valuenumeratordenominator
331444user_id2023-02-13clicks2
aeeer43dvisit_id2023-02-13visits1
224448user_id2023-02-13checkout_rate215
224448user_id2023-02-13clothing_checkout_value33.22

Note that:

  • In this dataset, unit types are in different rows from each other
  • Metrics can either have a value or a numerator/denominator pair. We will calculate any metric with numerator/denominator pair as a ratio metric. Ratio takes priority over value; if you provide all 3 fields, we will assume it is a ratio metric.
  • For users with null values, we will infer 0 for metric_value, and exclude null value users for ratio metrics.


Exposure Events

info

NOTE: Exposure event import is deprecated. If this is an important use case, see Statsig Warehouse Native, available to Enterprise Customers

Exposure events are generated by your assignment tool, when it assigns your users to a certain variant of an experiment (e.g., show ad vs. hide ad).

Required

ColumnDescriptionFormat/Rules
timestampThe unix time your event was logged atBIGINT. please cast timestamps into timezoneless unix time
experimentYour experiment identifierSTRING/VARCHAR. Not null. Length < 128 characters
group_idUnique identifier for the experiment groupsSTRING/VARCHAR. Not null.
unit_idUnique user identifierSTRING/VARCHAR. User ID, Stable ID, etc. The same event row can have multiple IDs

Optional

ColumnDescriptionRules
metadata_jsonMetadata json about the eventJSON STRING. Statsig will unpack this json 1 level deep. Nested values will be stored as strings