How it’s different
Standard A/B testing relies on a few core assumptions:- You can reliably and randomly split your users into similar groups
- While each group has heterogeneity inside it, random user-level variations average out at larger sample sizes, resulting in homologous (similar) groups
- You control how each group gets treated, and there is no interference between groups
- Your intended treatment can’t be controlled and scoped to the user level
- You don’t know who the users are
- You can’t track outcomes at the user level
- Some classic marketing, like billboards, can’t be randomly split between test and control users at a given location. We can’t track who sees them and how they act afterwards.
- Some digital campaigns might be able to split users 50/50 (if the platform allows it), but if you don’t have user-level data that you can tie to outcomes of interest, you’ll never know how the two groups performed.
Using Synthetic Controls
“Geotesting” as we refer to it here is an experimental framework that relies on a different basic setup than AB tests:- Split your users into geographical boundaries at some useful level (zip codes, states, countries, etc)
- Apply a known treatment in some “treatment” geos, and no treatment in some “control” geos
- Use the control geos to figure out what would have happened in the treatment geos had no treatment been applied (i.e. “synthetic” results)
- Measure the delta in the treatment geos between the actual and synthetic values
Approach
There are many ways to estimate unknown data from other known data, depending on a multitude of factors. In the case of Geotesting, we have:- timeseries data that is autocorrelated (i.e. data is correlated from one day to the next within a group)
- timeseries data that is correlated between groups
- regular seasonal patterns that repeat at regular intervals
- granular data from among many distinct geographies


- Split your geos into test and control groups
- Using the pre-treatment period data, train a model on the control data to predict how the treatment geos behave
- Use the post-treatment period data for the control geos, create a “synthetic control” dataset to predict how the treatment geos would have behaved after the treatment event.
- Subtract the synthetic data from the actual observed data to determine the induced effects of your treatment on the treatment geos

