Paranoid about uptime? 9 things to do!
Statsig serves billions of individual user interactions. Along the way, we designed the service for reliability and availability of your apps that use Statsig. Because of this, in the case where your application cannot reach Statsig for any reason, your application will continue to work exactly as you expect with locally cached values.
Collected here are a set of best practices that help maximize your uptime - across potential issues including failed client connectivity (to your server), failed server connectivity to Statsig and buggy or deprecated code. You can also read more about how we design for failure.
-
Feature Gates and Experiments have default values that are used in an evaluation . You can disable Feature Gates and set a default value for it; however if your SDK has no information this will default to false. For experiments you specify default values in code. Validate these work and don’t break the experience.
-
Test your code with all the possible assignments in Experiments and Feature Gates. Use overrides to easily do this, along with the inline, real time diagnostics. Don't roll out your experiment and then find a variant (group) that crashes.
-
Use Statsig's support for pre-production environments (e.g. Dev, Staging) as part of your validation process. Pre-production environments can remove change approval requirements, allowing swifter iteration.
-
Start small, validate and then ramp. With feature gates we recommend rolling out to 2% (check for crashes/obvious bugs) before ramping up to 10%, 50% and then 100% while watching metrics you measure. You can also have Statsig fire Metric Alerts when thresholds are violated.
-
Clean up your code and remove features you've finished launching. This ties back to items #1 and #2; you don't want to leave potential code paths you're not testing/monitoring that can be triggered.
-
Use change management on Statsig in production. Changes should be approved by a reviewer. For critical areas, you can enforce an Allowed Reviewer group that has enough context to decide. Statsig Feature Gates allow you to easily audit and roll back changes.
-
Caching on client SDKs: Initializing Statsig client SDKs requires them to connect to Statsig and download config. Client SDKs can cache and reuse config (for the same user) if they are offline. You can also choose to bootstrap your client from your own server (and remove the round trip to Statsig) by using client SDK bootstrapping.
-
Caching on server SDKs: Initializing Statsig server SDKs requires them to connect to Statsig and download config. If connectivity to Statsig fails, initialization fails (falling back to default values). Remove this dependency when connectivity fails by providing this config locally. Read more about dataAdapter
-
Test for failure conditions explicitly (e.g. no Statsig client or server connectivity). Run a disaster simulation (e.g. break DNS routing to Statsig within your data center) and test client app behavior when Statsig is unreachable. Also understand and embrace the variety of capabilities we’ve built to support different kinds of testing.