Video details

Failure Happens: Improving Incident Response in Large-Scale Organizations

10.29.2017 at LISA

San Francisco

Damon Edwards (Rundeck, Inc.)

Deployment is a solved problem. Yes, there is still work to be done, but the operations community has successfully proven that we can both scale deployment automation and distribute the capability to execute deployments. Now, we have to turn our attention to the next critical constraint: What happens after deployment?
We all know that failure is inevitable and is coming our way at any moment. How do respond quickly and effectively to those failures? What works when there is just a small set of teams or an isolated system to manage will quickly break down when the organization grows in size and complexity. But on the other hand, what has been commonly practiced in large-scale enterprises is proving to be too cumbersome, too silo dependent, and simply too slow for today's business needs.
How do we rapidly respond to incidents and recover complex interdependent systems while working within an equally complex and interdependent organization? How does operations embrace the DevOps and Agile inspired demand for speed and self-service while maintaining quality and control?
This talk examines the trial-and-error lessons learned by some forward-thinking enterprises who are currently streamlining how they: -Resolve incidents -Reduce friction between teams -Divide up operational responsibilities -Improve the quality of their ongoing operations. -See how these companies are rethinking how and where operations happens by applying Lean and DevOps principles mixed with modern tooling practices.