Data Mapping
Map columns from your data warehouse to Statsig events, user IDs, and properties so ingested data flows into metrics and experiment analysis correctly.
How data mapping works
Statsig requires a specific data schema for processing. Statsig supports three types of datasets:
- Custom Events
- Precomputed Metrics
- Exposure Events
During setup, map the columns in your data output to the fields Statsig expects. Statsig runs a sample query to check for issues with data types, the mapping, or the base query.
Statsig casts fields into the appropriate type. For example, Statsig accepts string IDs, but an ID field left as an integer is also accepted.
Custom events
Custom events are sent by your application to measure the ongoing impact of your features and experiments.
Required
| Column | Description | Format/Rules |
|---|---|---|
| timestamp | The unix time your event was logged at | BIGINT. Cast timestamps into epoch time in seconds. |
| event_name | The name of the event | STRING/VARCHAR. Not null. Length < 128 characters |
| unit_id | Unique unit identifier | STRING/VARCHAR. User ID, Stable ID, etc. The same event row can have multiple IDs |
Optional
| Column | Description | Rules |
|---|---|---|
| event_value | The value of the event | STRING/VARCHAR. Length < 128 characters. Statsig detects numeric values. |
| event_metadata | Metadata columns about the event | (MANY) STRING:STRING. Statsig generates a metadata json field from however many metadata columns you provide. |
| metadata_json | Metadata json about the event | JSON STRING. Statsig unpacks this json 1 level deep. Nested values are stored as strings. |
An example dataset for events might look like this:
| unit_id | visit_id | event | timestamp | value | metadata_blob | user_type |
|---|---|---|---|---|---|---|
| 331444 | click | 1676484875 | {"click_target": "exit_details_button"} | power_user | ||
| 331444 | click | 1676484860 | {"click_target": "open_details_button"} | power_user | ||
| 265113 | click | 1676484333 | {"click_target": "button", "button_color": "green"} | churn_risk_user | ||
| 445332 | aeeer43d | visit | 1676483821 | {"page": "landing_page"} | new_user | |
| 224448 | checkout | 1676482222 | 33.22 | {"product_id": "11eefj", "product_category": "clothing"} | power_user |
Custom events have the following characteristics:
- One user can send multiple of the same event, with or without any changes in metadata. Statsig aggregates these together.
- You can send metadata in both of a json-formatted (only one-level deep) string, and/or pull in fields from columns. You can use metadata and values to generate custom metrics in the console, like sum(value) where "product_category"="clothing".
- You can send multiple IDs on a single event. For example, the visit above would count for both user and visit level metrics and experiments. During the mapping flow, specify which unit types your different IDs correspond to in Statsig.
Precomputed metrics
Use precomputed metrics to send complex metrics, or metrics that require delays due to attribution windows or long baking periods, to Statsig for experiment results.
Precomputed metrics in Statsig are expected to be calculated at a user-day granularity.
Required
| Column | Description | Format/Rules |
|---|---|---|
| unit_id | The unique user identifier this metric is for. This might not necessarily be a user_id - it could be a custom_id of some kind | STRING |
| id_type | The id_type the unit_id represents. | STRING. A valid ID type from your project |
| date | Date of the daily metric | DATE or ISOFORMATTED STRING. The date of the metric value for the unit_id provided |
| metric_name | The name of the metric | STRING (Not null). Length < 128 characters |
| metric_value | A numeric value for the metric | DOUBLE/NUMERIC. Metric value |
OR
Both of numerator/denominator need to be provided for Statsig to process the metric. | | numerator | Numerator for metric calculation | DOUBLE/NUMERIC. If present along with a denominator in any record, Statsig treats the metric as a ratio and only calculates it for users with non-null denominators. | | denominator | Denominator for metric calculation | DOUBLE/NUMERIC. If present along with a numerator in any record, Statsig treats the metric as a ratio and only calculates it for users with non-null denominators. |
An example dataset for metrics might look like this:
| unit_id | unit_type | date | metric_name | metric_value | numerator | denominator |
|---|---|---|---|---|---|---|
| 331444 | user_id | 2023-02-13 | clicks | 2 | ||
| aeeer43d | visit_id | 2023-02-13 | visits | 1 | ||
| 224448 | user_id | 2023-02-13 | checkout_rate | 2 | 15 | |
| 224448 | user_id | 2023-02-13 | clothing_checkout_value | 33.22 |
Precomputed metrics have the following characteristics:
- In this dataset, unit types are in different rows from each other
- Metrics can either have a value or a numerator/denominator pair. Statsig calculates any metric with a numerator/denominator pair as a ratio metric. Ratio takes priority over value; if you provide all 3 fields, Statsig treats it as a ratio metric.
- For users with null values, Statsig infers 0 for metric_value, and excludes null value users for ratio metrics.
Exposure events
Exposure events are generated by your assignment tool when it assigns users to a variant of an experiment (for example, show ad vs. hide ad).
Required
| Column | Description | Format/Rules |
|---|---|---|
| timestamp | The unix time your event was logged at | BIGINT. Cast timestamps into timezoneless unix time. |
| experiment | Your experiment identifier | STRING/VARCHAR. Not null. Length < 128 characters |
| group_id | Unique identifier for the experiment groups | STRING/VARCHAR. Not null. |
| unit_id | Unique user identifier | STRING/VARCHAR. User ID, Stable ID, etc. The same event row can have multiple IDs |
Optional
| Column | Description | Rules |
|---|---|---|
| metadata_json | Metadata json about the event | JSON STRING. Statsig unpacks this json 1 level deep. Nested values are stored as strings. |
Was this helpful?