Motivation
The same data can yield very different interpretations in experiment results due to the wide variety of analysis methodology available. One of the advantages of modern experimentation platforms is ensuring consistency and transparency in experimental analysis within your organization. This paper is a brief guide to common gaps between platforms, as well as how to identify and resolve them.General Approach
When companies are evaluating an experimentation vendor, it’s common to observe differences in results between their in-house platform and the vendor’s platform when they run Proof-of-Concept (POC) validations. We’ve consistently been able to resolve these gaps with the steps in this document. The high level hypothesis will be that one of the following is true:- The metric source data is being read or joined to exposure data differently, invalidating downstream steps
- Some advanced stats features that are available on the vendor side, but not in-house, are ‘working as intended’, most often reducing the influence of outliers or pre-experiment bias
- There is a misunderstanding on how a metric definition works, or how an advanced configuration on a metric or experiment behaves By going through these in order, data teams evaluating a platform can quickly understand and address gaps, or understand the gap and make a decision on if the vendor’s approach is acceptable to them.
Joining Data
Based on our observational data, differences in experiment results most often stem from how exposure data is joined with metric data. At the end of this section we will cover a basic check for confirming this isn’t occurring.ID Formats
In some cases, people log IDs in different formats to different places. For example, the binary id4TLCtqzctSqusYcQljJLJE
maps to the UUID a0fb4ef0-9d9e-11eb-9462-7bfc2b9a6ff2
, so a company might have the binary ID in their production environment and log that, while their data users work with the equivalent UUIDs.
This means that the exposures logged using the binary ID would not be able to join with the metric data using the UUID, and results would be empty. As suggested on the ‘User Metrics not Calculated’ health check, you can check samples for both the metric source in question and the assignment source or diagnostic logstream to confirm that the identifiers are in the same format.
ID Resolution can be used to bridge ID type gaps, but is not intended to solve for this scenario; ID Resolution helps you connect identifiers across logged-out/logged-in sessions, or other scenarios where users will commingle their identifiers because of switching identifiers during the experiment.