Commercial aviation, civil and structural engineering, emergency medicine, and the nuclear power industry all have hard-earned lessons gained over their respective histories, histories that stretch back decades or even centuries. Often acquired at a bloody cost, these experiences led to the development of environments typified by stringent regulation, strict test and design protocols, and demanding training and education requirements—all driven by a need to minimize loss of life.
In stark contrast, the computer industry in general and systems administration specifically have developed in a relatively unrestricted environment, largely free, outside of a few niche fields, from the regulation and external control seen in life-safety critical fields.
However, despite these major differences, these far more demanding environments still have many lessons to offer systems administrators and systems designers and engineers to apply to the design, development, and operation of computing systems.
We will look at incidents ranging from Air France 447 to Three Mile Island and what we can learn from the experiences of those involved both in the incidents and the subsequent investigations. We will draw parallels between our field as a whole and these other less forgiving fields in areas such as Education and Training, Monitoring, Design and Testing, Human Computer/Systems Interaction, Human Performance Factors, Organizational Culture, and Team Building.
We hope that you will take away not just a list of object lessons but also a new perspective and lens through which to view the work you do and the environment in which you do it.