On this page

Warehouse Costs

How you can manage costs with statsig

This page is a high-level summary of how to think about Warehouse Costs and Total Cost of Ownership on Statsig Warehouse Native. There is some overlap with the best practices page; treat that page as an implementation guide and this page as a high-level overview.

Why do costs matter?

Warehouse Native platforms require you to provide compute and storage to run data jobs like experiment analyses and analytics queries. This is a cost center, and can meaningfully drive up your warehouse bill if queries are unoptimized or poorly integrated with your databases.

This is a major concern for many customers considering a Warehouse Native solution for Experimentation or Analytics. Warehouse Native platforms are very flexible, but that flexibility can lead to unexpected data costs. Understanding total cost of ownership is important when evaluating a platform, as many platforms don't treat it as a primary focus.

Statsig started as a cloud-only platform with a strong focus on cost efficiency. By continuously optimizing pipelines on the cloud side of its business, Statsig has developed expertise in cost-saving strategies that it passes on to Warehouse Native customers.

Benchmarks

Total cost of ownership

Warehouse costs typically hover between 5% and 20% of customers' platform spend with Statsig. There are exceptions depending on usage, but these tend to be on the low end. To avoid cherry picking, the examples below are an anonymized list of the top 5 Statsig Warehouse Native customers on Snowflake, by total warehouse spend:

This is fairly representative of what you can expect using Statsig at scale with reasonable adherence to best practices. Costs do scale with number of experiments; companies running fewer experiments generally see a smaller relative TCO.

Relative to experiment size, there is significant variation in spend depending on how many metrics you use, what kind of metrics, and how "dense" your metrics are. The table below shows typical costs when teams follow best practices:

Turbo Mode is especially efficient for large-scale experiments. It dramatically reduces memory usage, keeping large jobs from spilling to disk and reducing the need for XL clusters to provide enough memory.

Industry

Statsig aims to be best-in-class for Warehouse Native experimentation costs. Customers who evaluate multiple vendors consistently report that Statsig is 50-66% of the cost of other platforms.

In cases where Statsig has been more expensive than competitors, the team investigates quickly. For example, a recent customer accidentally used un-partitioned data for their Statsig proof of concept, which drove up cost relative to another vendor where the tables were partitioned. Statsig followed up by building warnings when source data isn't partitioned. After partitioning, cost went from 120% of the other vendor's cost to under 60%.

Multiple customers have migrated to Statsig from other Warehouse Native Experimentation platforms due to cost concerns. Reach out to the Sales team to run an evaluation of relative costs. If you have concerns, or another party claims their solution is more efficient, contact Statsig to discuss.

Tools for understanding cost

Statsig aims to be as transparent as possible about costs. Numerous tools throughout the product help in this regard:

History

In every loaded experiment or gate, you can quickly see the cost of each load. The metrics shown are relevant to your warehouse, for example, slot hours or bytes scanned for BigQuery, uptime for Snowflake, and so on.

Pipeline History

Pipeline overview dashboard

Statsig automatically creates a Pipeline Overview dashboard in its MEx platform. The underlying data is written to your warehouse, so you can investigate what is driving costs from Statsig. Many customers have built their own monitoring and alerting on top of this dataset and notify Statsig if they see any changes in cost.

Pipeline Overview

Vendor tooling

Use standalone compute and storage resources for Statsig in your warehouse. This makes total cost of ownership easy to understand. Some cost information isn't visible to the platform without administrator rights on your warehouse.

If there is a mismatch between Statsig's dashboard and your costs, reach out to support. Statsig takes this seriously and works to resolve the gap.

Tools for managing cost

Follow UI hints

Statsig proactively flags to you if:

  • A data source is particularly slow.
  • A metric source is malformed (e.g. a view definition which is broken within your warehouse).
  • A data source isn't following best practices.

Paying attention to these and flagging them quickly prevents wasted compute. In most cases where PoC customers had performance issues, they boiled down to issues like their temp tables for the PoC not being partitioned, or not using macros to filter tables with extremely long data retention.

Access controls

For large organizations, most cost issues are caused by poorly optimized source tables, or using the wrong source when pulling data.

Statsig offers access controls so that:

  • Your data team can control data sources and ensure they are well-behaved.
  • Verified sources are clearly indicated in the console for end users.

Following best practices

The best practices cover most situations that lead to high warehouse costs on Statsig. In short, you should aim to:
  • Cluster or partition source tables.
  • Use Statsig date macros to guarantee filter push-down.
  • Use Statsig's Turbo Mode to reduce redundant compute, especially for large or long experiments.

Was this helpful?