Data Mapping
Overview
Statsig requires certain data schema in order for proper processing. We support 3 different types of datasets to be ingested into our platform:
- Custom Events
- Precomputed Metrics
- Exposure Events
During setup, we will ask you to map columns in your data output to fields Statsig expects and run a small sample query to make sure that there aren't any basic issues with data types, the mapping, or the base query itself.
Please note that we will cast fields into the appropriate type. For example, Statsig accepts String IDs, but it's okay to leave an ID field as an integer.
Custom Events
Events that are emitted by your application to measure the ongoing impact of your features and experiments.
Required
Column | Description | Format/Rules |
---|---|---|
timestamp | The unix time your event was logged at | BIGINT. please cast timestamps into epoch time in seconds |
event_name | The name of the event | STRING/VARCHAR. Not null. Length < 128 characters |
unit_id | Unique unit identifier | STRING/VARCHAR. User ID, Stable ID, etc. The same event row can have multiple IDs |
Optional
Column | Description | Rules |
---|---|---|
event_value | The value of the event | STRING/VARCHAR. Length < 128 characters. Statsig will detect numeric values |
event_metadata | Metadata columns about the event | (MANY) STRING:STRING. Statsig will generate a metadata json field from however many metadata columns you provide |
metadata_json | Metadata json about the event | JSON STRING. Statsig will unpack this json 1 level deep. Nested values will be stored as strings |
An example dataset for events might look like this:
unit_id | visit_id | event | timestamp | value | metadata_blob | user_type |
---|---|---|---|---|---|---|
331444 | click | 1676484875 | {"click_target": "exit_details_button"} | power_user | ||
331444 | click | 1676484860 | {"click_target": "open_details_button"} | power_user | ||
265113 | click | 1676484333 | {"click_target": "button", "button_color": "green"} | churn_risk_user | ||
445332 | aeeer43d | visit | 1676483821 | {"page": "landing_page"} | new_user | |
224448 | checkout | 1676482222 | 33.22 | {"product_id": "11eefj", "product_category": "clothing"} | power_user |
Note that:
- One user can send multiple of the same event, with or without any changes in metadata. Statsig will aggregate these together.
- You can send metadata in both of a json-formatted (only one-level deep) string, and/or pull in fields from columns. You can use metadata and values to generate custom metrics in the console, like sum(value) where "product_category"="clothing".
- You can send multiple IDs on a single event. For example, the visit above would could for both user and visit level metrics/experiments. During the mapping flow you tell us which unit types your different IDs correspond to in statsig.
Precomputed Metrics
Precomputed metrics are a powerful way to leverage statsig for experiment results. Use these to send complex metrics and metrics that require delays due to attribution windows or long baking periods.
Precomputed metrics in statsig are expected to be calculated at a user-day granularity.
Required
Column | Description | Format/Rules |
---|---|---|
unit_id | The unique user identifier this metric is for. This might not necessarily be a user_id - it could be a custom_id of some kind | STRING |
id_type | The id_type the unit_id represents. | STRING. A valid ID type from your project |
date | Date of the daily metric | DATE or ISOFORMATTED STRING. The date of the metric value for the unit_id provided |
metric_name | The name of the metric | STRING (Not null). Length < 128 characters |
metric_value | A numeric value for the metric | DOUBLE/NUMERIC. Metric value OR Both of numerator/denominator need to be provided for Statsig to process the metric. |
numerator | Numerator for metric calculation | DOUBLE/NUMERIC. If present along with a denominator in any record, the metric will be treated as ratio and only calculated for users with non-null denominators. |
denominator | Denominator for metric calculation | DOUBLE/NUMERIC. If present along with a numerator in any record, the metric will be treated as ratio and only calculated for users with non-null denominators. |
An example dataset for metrics might look like this:
unit_id | unit_type | date | metric_name | metric_value | numerator | denominator |
---|---|---|---|---|---|---|
331444 | user_id | 2023-02-13 | clicks | 2 | ||
aeeer43d | visit_id | 2023-02-13 | visits | 1 | ||
224448 | user_id | 2023-02-13 | checkout_rate | 2 | 15 | |
224448 | user_id | 2023-02-13 | clothing_checkout_value | 33.22 |
Note that:
- In this dataset, unit types are in different rows from each other
- Metrics can either have a value or a numerator/denominator pair. We will calculate any metric with numerator/denominator pair as a ratio metric. Ratio takes priority over value; if you provide all 3 fields, we will assume it is a ratio metric.
- For users with null values, we will infer 0 for metric_value, and exclude null value users for ratio metrics.
Exposure Events
NOTE: Exposure event import is deprecated. If this is an important use case, see Statsig Warehouse Native, available to Enterprise Customers
Exposure events are generated by your assignment tool, when it assigns your users to a certain variant of an experiment (e.g., show ad vs. hide ad).
Required
Column | Description | Format/Rules |
---|---|---|
timestamp | The unix time your event was logged at | BIGINT. please cast timestamps into timezoneless unix time |
experiment | Your experiment identifier | STRING/VARCHAR. Not null. Length < 128 characters |
group_id | Unique identifier for the experiment groups | STRING/VARCHAR. Not null. |
unit_id | Unique user identifier | STRING/VARCHAR. User ID, Stable ID, etc. The same event row can have multiple IDs |
Optional
Column | Description | Rules |
---|---|---|
metadata_json | Metadata json about the event | JSON STRING. Statsig will unpack this json 1 level deep. Nested values will be stored as strings |