Subjecting systems to failures is supposed to increase confidence in their stability. But why? How do you form failure hypotheses? How do you reason about their safety? Why should your organization listen to you and invest in testing your failure hypotheses?
These are some of the questions I faced during my quest to improve production stability at work. In this talk, we will discuss three questions:
How to form better hypotheses, and in particular, how to separate opinions from hypotheses? I will share some examples of overly simplistic hypotheses, testing of which may not produce desired outcomes.
How to push safety boundaries to help you increase the riskiness of your hypotheses? I will outline some tenets of increasing systems safety, before you can increase the riskiness of your hypothesis.
How to influence better trade-offs between investing in chaos engineering and everything else? How to release the constant trade-off tension that exists in most organizations, so that, you as engineer, can influence your organization to invest in chaos testing?
Subbu Allamaraju @sallamar
Subbu Allamaraju is a senior technologist at the Expedia Group, where he is leading a large-scale migration of Expedia Group's travel platforms from enterprise data centers to a highly available architecture in the cloud. Subbu is a well-rounded engineer and influencer with hands-on experience in software development, architecture, distributed systems, services, internet protocols, operations, and the cloud. Over the past several years, he has helped build and empower several engineering and operations teams in these areas.