Databricks Connection

Overview

To set up a connection with Databricks, you will need the following:

Your Databricks Server Hostname
An HTTP Path to a Cluster or SQL Warehouse
A staging database for writing results and intermediate tables into
An access token with read access on your experiment data and write access to the staging database
Use either Serverless SQL Warehouse or an always-on cluster. Statsig uses interactive queries during setup (and some analysis) which will fail if the cluster takes several minutes to start up.

Start by enabling the Databricks source in your project settings.

Note on Databricks Access

For users who use databricks and a dbfs-based deltalake as their primary warehouse, permissions are managed easily for databricks. For customers using databricks as an intermediary to other data sources, you need to make sure that databricks and the Statsig user through databricks has appropriate access to your storage (e.g. S3 for athena tables). Often, permissions are reset at some point which will start to cause errors; to validate this, try creating tables through the Statsig console to verify that the permissions on Statsig's side are sufficient.

Getting Connection Information

Follow the Databricks documentation to get the hostname and http path of the cluster you'll use to run your experimental analysis. You may want to create a specific cluster for this use case.

credentials

Follow these instructions to get the personal access token that will be used to calculate experiment results on your warehouse. Alternatively, you can follow these instructions to get a personal access token for a service principal.

databricks info

Create or choose a database to use. For example, you could run this sql in a notebook:

staging_database_name = '<my_name>'
spark.sql(f"CREATE DATABASE IF NOT EXISTS {staging_database_name}")

What IP addresses will Statsig access data warehouses from?

See FAQ

Overview​

Note on Databricks Access​

Getting Connection Information​

What IP addresses will Statsig access data warehouses from?​

Overview

Note on Databricks Access

Getting Connection Information

What IP addresses will Statsig access data warehouses from?