On this page

Azure Metrics Upload (Deprecated)

Import event and metric data into Statsig from Azure Blob Storage on a schedule, including file format options and column-to-event mappings.

This solution is still functional, but can be manual and time consuming to set up with minimal error handling. Check out the Data Warehouse Ingestion solution instead.

How Azure metrics upload works

Statsig lets you upload pre-computed metrics data to a secure Azure blob that Statsig owns. Statsig ingests all uploaded metrics for a day after you signal that a given day is finished uploading.

Getting started

Reach out in Slack or to your primary Statsig point of contact. Statsig will set up an Azure blob storage container and provide credentials to connect.

Filesystem format

To allow for daily uploads, set up your blob storage container with the following folders:

  • events/ for events data
  • metrics/ for metrics data
  • signals/ for signal flags when you've finished uploading data for a day. You can omit this folder and instead use the mark_data_ready API instead, but you must use one or the other

Statsig recommends writing folders by date partitions for easier debugging, for example storing daily data in folders with ISO-formatted names (YYYY-MM-DD).

Data format

Confirm your data conforms to the following schemas.

** Events **

plaintext
| Column         | Description                                                                                                       | Rules                                                                                                   |
| -------------- | ----------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| timestamp      | UNIX timestamp of the event                                                                                       | UTC timestamp                                                                                           |
| event_name     | The name of the event                                                                                             | String under 128 characters, using `_` for spaces                                                       |
| event_value    | A string representing the value of a current event. Can represent a 'dimension' or a 'value'                      | Read as string format; Statsig converts numeric values into value                                       |
| event_metadata | A dictionary<string, string> in the form of a JSON string, containing named metadata for the event                | String format. Not null. Length < 128 characters                                                        |
| user           | A JSON object representing the user this event was logged for; see below                                          | Escaped JSON string including the keys 'custom' and 'customIDs'. A userID or customID must be provided. |
| timeuuid       | A unique UUID or timeUUID used for deduping. If omitted, Statsig generates one but it won't be effective for deduping | UUID format                                                                                         |
Go to Statsig User Object for available fields. An example user object:
plaintext
{
  userID: "12345",
  customIDs: {
    stableID: "<device_id_here>",
    ...
  }
  email: "12345@gmail.com",
  userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.40 Safari/537.36",
  ip: "192.168.1.101",
  country: "US",
  locale: "en_US",
  appVersion: "1.0.1",
  systemName: "Android",
  systemVersion: "15.4",
  browserName: "Chrome",
  browserVersion: "45.0",
  custom: {
    new_user: "false",
    age: "22"
    ...
  },
}

** Metrics **

Include all of metric_value, numerator, and denominator. Write cast(null as double) for numerator and denominator if you are omitting them (or for metric_value if sending numerator/denominator).

Scheduling

Because you may be streaming events to your tables or have multiple ETLs pointing to your metrics table, Statsig relies on you signaling that your metric/events for a given day are done.

To signal completion, write a dataset with the single column finished_date, which contains all dates of data written to Statsig. For example, after writing data for 2022-06-22, insert a record with finished_date of 2022-06-22 to trigger ingestion of data up to and including 2022-06-22.

Unlike Snowflake, Statsig skips dates for S3. If your latest finished date is 2022-06-22 and you insert 2022-07-01, Statsig ingests all data as of 2022-07-01 and infers that intermediate dates (for example, 2022-06-25) have data loaded.

Alternatively, you can use the mark_data_ready API and send a timestamp indicating that all data before that timestamp has finished loading into your container.

Statsig processes events in PST. When you mark data ready for 2022-06-20, Statsig processes events from 2022-06-20T00:00 PST to 2022-06-20T23:59 PST. Account for this when scheduling your signals.

<a name="checklist" />

Checklist

Check these common errors before going live:

  • Field names are set incorrectly.
  • The id_type is set correctly.
    • Default types are user_id or stable_id. If you have custom ids, confirm that capitalization and spelling match, because these values are case sensitive (you can find your custom ID types in Project Settings in the Statsig console).
  • Your IDs match the format of IDs logged from SDKs.
    • In some cases, your data warehouse may transform IDs. Transformed IDs can prevent Statsig from joining your experiment or feature gate data to your metrics to calculate pulse or other reports. Go to the Metrics page of your project and view the log stream to check the format of the IDs being sent (either User ID, or a custom ID in User Properties) to confirm they match.

Was this helpful?