On this page

Snowflake (Deprecated)

Import event and metric data into Statsig from Snowflake on a schedule, including authentication, queries, and column-to-event mappings.

This solution is still functional, but can be manual and time consuming to set up with minimal error handling. Check out the Data Warehouse Ingestion solution instead.

How Snowflake integration works

There are 2 ways to integrate with Snowflake: using a data connector, or through direct ingestion from Snowflake.

Using a data connector

To ingest events from Snowflake, you can use the Census integration.To export events to Snowflake, you can use the Fivetran integration.

Direct ingestion from Snowflake

Statsig also supports direct data ingestion from Snowflake. After setup, Statsig automatically pulls data from Snowflake into your events. Complete the following steps. Follow the checklist below to avoid delays.

1. Set up your Snowflake data warehouse and user for Statsig integration

Insert USER and PASSWORD values in the SQL below and run it in a Snowflake worksheet on an account with sysadmin and securityadmin roles.

This creates the table schemas and configuration that Statsig's ingestion uses. Statsig uses the user you create to access tables in the new Statsig schema. Use a unique and secure username and password, and replace the placeholder values in the first 2 statements.

sql
BEGIN;

  -- set up variable values to be used in statements later
  -- make sure to configure user_name and user_password with your own values
  SET user_name = '<USER>'; -- REPLACE WITH YOUR OWN VALUE
  SET user_password = '<PASSWORD>'; -- REPLACE WITH YOUR OWN VALUE
  SET role_name = 'STATSIG_ROLE';

  -- change role to sysadmin for warehouse / database steps
  USE ROLE sysadmin;

  -- create a warehouse, database, schema and tables for Statsig
  CREATE OR REPLACE WAREHOUSE statsig WITH warehouse_size='XSMALL';
  CREATE DATABASE IF NOT EXISTS statsig;
  CREATE SCHEMA IF NOT EXISTS statsig.statsig;

  -- a table for ingestion of raw events
  CREATE TABLE IF NOT EXISTS statsig.statsig.statsig_events(
      time BIGINT NOT NULL, -- unix time
      timeuuid STRING NOT NULL DEFAULT UUID_STRING(), --generated unique timeuuid
      user STRING NOT NULL, --json user object
      event_name STRING NOT NULL,
      event_value STRING,
      event_metadata STRING NOT NULL, --json metadata object
      event_version BIGINT,
      record_number NUMBER AUTOINCREMENT START 1 INCREMENT 1
  );

  -- a table for ingestion of metrics/user outcomes
  CREATE TABLE IF NOT EXISTS statsig.statsig.statsig_user_metrics(
      unit_id STRING NOT NULL,
      id_type STRING NOT NULL, -- stable_id, user_id, etc.
      date DATE NOT NULL, -- YYYY-MM-DD. Statsig calculates dates according to PST
      timeuuid STRING NOT NULL DEFAULT UUID_STRING(),
      metric_name STRING NOT NULL,
      metric_value NUMBER,
      numerator NUMBER,
      denominator NUMBER
  );

  CREATE TABLE IF NOT EXISTS statsig.statsig.statsig_events_signal(
      finished_date DATE
  );

  CREATE TABLE IF NOT EXISTS statsig.statsig.statsig_user_metrics_signal(
      finished_date DATE
  );

  -- change current role to securityadmin to create role and user for Statsig's access
  USE ROLE securityadmin;

  -- create role for Statsig
  CREATE ROLE IF NOT EXISTS identifier($role_name);
  GRANT ROLE identifier($role_name) TO ROLE SYSADMIN;

  -- create a user for Statsig
  CREATE USER IF NOT EXISTS identifier($user_name)
  password = $user_password
  default_role = $role_name
  default_warehouse = statsig;
  GRANT ROLE identifier($role_name) TO USER identifier($user_name);

  -- grant Statsig role access
  GRANT USAGE ON WAREHOUSE statsig TO ROLE identifier($role_name);
  GRANT USAGE ON DATABASE statsig TO ROLE identifier($role_name);
  GRANT USAGE ON SCHEMA statsig.statsig TO ROLE identifier($role_name);
  GRANT SELECT ON statsig.statsig.statsig_events TO ROLE identifier($role_name);
  GRANT SELECT ON statsig.statsig.statsig_user_metrics TO ROLE identifier($role_name);
  GRANT SELECT ON statsig.statsig.statsig_events_signal TO ROLE identifier($role_name);
  GRANT SELECT ON statsig.statsig.statsig_user_metrics_signal TO ROLE identifier($role_name);

COMMIT;

Confirm all statements ran successfully. This creates the schema and user that Statsig's ingestion expects.

2. Provide the credentials to Statsig

  • Go to console.statsig.com and log in.
  • Go to the settings page and navigate to the Integrations tab.
  • Find Snowflake in the integrations list and provide the requested credentials for the user you just created in step 1.
  • You can use the Test Connection button to confirm a connection to the table using the credentials provided.

Snowflake integration configuration interface

3. Load data into the new Statsig tables

Step 1 created 2 data tables and 2 signal tables. To load data into Statsig, load data into the data tables, and mark a day as completed in the corresponding signal table after all data for that day is loaded.

The statsig_events table is for sending raw events that were not logged to Statsig through the API. After loading, these events are processed as though they were logged directly.

The statsig_user_metrics table is for sending pre-computed metrics from your data warehouse. These metrics appear in the Statsig console and in your test results.

You may only need one of these tables. Follow the steps for the relevant table and ignore the other.

Your data must conform to these definitions and rules to avoid errors or delays:

Events (statsig_events)

The user object is a stringified JSON representation. An example might look like:

json
{
  "os": "Mac OS X",
  "os_version": "10.15.7",
  "browser_name": "Electron",
  "browser_version": "11.5.0",
  "ip": "1.1.1.1",
  "country": "KR",
  "locale": "en-US",
  "userID": "bbbbb-bbbbb-bbbbb-bbbbb",
  "custom": {
    "locale": "en-US",
    "clientVersion": "23.10.23.10",
    "desktopVersion": "11.5.0"
  },
  "customIDs": {
    "deviceId": "ddddd-ddddd-ddddd-ddddd",
    "stableID": "sssss-sssss-sssss-sssss"
  }
}

Key components of the user object are the userID, custom fields, and the customIDs object (notably stableID) if you use any custom identifiers.

Provide these fields where they exist and ensure field names are capitalized correctly. Not providing a unit identifier limits the utility of your events, because Statsig can't use them to build metrics like daily event users.

Metrics (statsig_user_metrics)

Metric ingestion is for user-day metric pairs. This is useful for measuring experimental results on complex business logic (for example, LTV estimates) that you generate in your data warehouse.

Note on metric values

If you provide both a numerator and denominator value for any record of a metric, Statsig treats the metric as a ratio metric, filters out users who don't have a denominator value from analysis, and recalculates the metric value using the numerator and denominator fields.

If you provide only a metric_value, Statsig uses the metric_value for analysis and imputes 0 for users without a metric value in experiments.

Scheduling

Because you may be streaming events to your tables or have multiple ETLs pointing to your metrics table, Statsig relies on you to signal that your metric/events for a given day are done.

After a day is fully loaded, insert that date as a row in the appropriate signal table: statsig_user_metrics_signal for metrics or statsig_events_signal for events. For example, after all metrics data is loaded into statsig_user_metrics for 2022-04-01, insert 2022-04-01 into statsig_user_metrics_signal.

Statsig expects you to load data in order. For example, if you've loaded up to 2022-04-01 and signal that 2022-04-03 has landed, Statsig waits for you to signal that 2022-04-02 has landed, then loads that data before ingesting data from 2022-04-03.

This ingestion pipeline is in beta and doesn't currently support automatic backfills or updates to data after Statsig ingests it. Only signal these tables are loaded after you've run data quality checks.

<a name="checklist" />

Checklist

Check these common errors before going live:

  • Field names are set incorrectly
    • Some field names may conflict with reserved keywords. Wrap them in quotes if needed. The SQL above works in the Snowflake console.
    • Run SELECT * on your tables to confirm there are no special characters in column names. Special characters cause ingestion to fail.
  • The id_type is set correctly.
    • Default types are user_id or stable_id. If you have custom ids, confirm that capitalization and spelling match, because these values are case sensitive (you can find your custom ID types in Project Settings in the Statsig console).
  • Your ids match the format of ids logged from SDKs.
    • In some cases, your data warehouse may transform IDs. Transformed IDs can prevent Statsig from joining your experiment or feature gate data to your metrics to calculate pulse or other reports. Go to the Metrics page of your project and view the log stream to check the format of the ids being sent (either User ID, or a custom ID in User Properties) to confirm they match.

If your data isn't showing up in the Statsig console:

  • Monitoring is limited, but you can check your Snowflake query history for the Statsig user to understand which data Statsig is pulling, and whether queries aren't executing (no history) or are failing.
  • You should see polling queries within a few hours of setting up your integration.
  • If you have a signal date in the last 28 days, you should see a select statement for data from the earliest signal date in that window.
  • If that query fails, try running it yourself to identify whether there is a schema issue.
  • If data is loading, Statsig is likely still processing. For new metrics, allow a day to process. If data hasn't loaded after a day or two, contact Statsig support. The most common reason for metrics catalog failures is id_type mismatches.

Was this helpful?