EP.25: MELTDOWNS: CHRIS CLEARFIELD ON PREVENTING SYSTEM FAILURES

Chris talks with Larry Weeks

Almost none of these crises are really out of nowhere…
You don’t necessarily have to predict the exact failure to get better managing these kinds of failures – Chris Clearfield

What does the 2007 financial crisis, the Fukushima nuclear accident, Three Mile Island, Deepwater Horizon and the 2017 Academy Awards “best picture” Oscar ceremony, all have in common?

The small things. Or rather, lots of tightly coupled small things that are overlooked due to complexity or ignored due to culture.

How to “build” an accident

In Deep Survival, Lawrence Gonzalez writes that accidents don’t just happen, they have to be assembled piece by piece “And if just one single piece is missing the accident simply doesn’t happen.”

Risk is unavoidable but accidents aren’t; the pieces have to be in place.

Accidents don’t just happen, they are assembled carefully, piece by piece.

That’s why I’ve been looking forward to talking with Chris Clearfield. He’s a science geek and reformed derivatives trader and the founder of System Logic, an independent research and consulting firm dedicated to understanding risk and its interaction with organizational factors. He’s also the co-author, with András Tilcsik, of Meltdown: Why Our Systems Fail, and What We Can do About it.

On this episode Chris shares some interesting research providing insight into the risks posed by complexity.

Deepwater Horizon accident

In this New York Times article about the Deepwater Horizon accident, the writers provide meticulous detail as to what happened based on thousands of documents and sworn testimony. This quote from that article is telling.

They were also frozen by the sheer complexity of the Horizon’s defenses, and by the policies that explained when they were to be deployed. One emergency system alone was controlled by 30 buttons. (Emphasis mine)

Something good, like organizing systems, technology or management hierarchies can result in something very bad when these systems become too complex or too layered.

Chris says this can be true at the organizational/institutional levels, where risk and complexity combined with corporate culture, increase both the likelihood and the total impact of a failure leading to a catastrophe (meltdown).

On the podcast we discuss the emergent properties of these system-wide failures, emergent in the same way as the 2009 financial crisis was “of the system and not an anomaly.”

Chris calls some of these meltdowns “normal accidents,” referencing the book by Charles Perrow. These are accidents that are a fundamental property of a system rather than caused by a failure or a specific human error.

The same kind of culture and decision making that led to the financial crisis also led to BP – Chris Clearfield.

But our conversation wasn’t just about system failures and why they occur but also what we can do about those failures.

A few takeaways for me

1. Beware the small domino.

You’ve seen the demonstrations like this all over the internet. A very small domino knocks over another domino about 1.5x larger and 100’s of pounds heavier than itself through a chain reaction starting with a minuscule push.

Chris says you need the ability first to see – then manage – the small problems before they become big problems.

2. Checklists and pre-mortems

As to point #1, find the problems that might lead to disaster. Create a checklist culture and audit your systems or projects.

What would really bad outcomes you’re worried about, be dependant on to happen? What bare minimum “thing” or things have to be in place for something to go wrong?

Hence the pre-mortem. Instead of waiting for the scary event, imagine a project or product has failed, the bad “thing” happened, then work backward and document what potentially might lead to the failure.

That’s your checklist.

Then go back and try to mitigate those things, those pieces, one by one.

What would have to be in place for something terrible to happen?

Chris says you should start auditing the systems, procedures, etc., you haven’t been measuring. What and where are the tightly coupled pieces? How many things across all your ‘systems’ have to go exactly right to succeed or goes wrong if they don’t?

3. Transparency as insurance and competitive advantage

Having an open culture is more than an attractive mission statement, it’s a preventative measure against disaster.

A willingness to discuss mistakes without berating are effective ways to find issues when they are small and mitigate complexity risks. Create a learning culture in concerning failure. Create open feedback loops.

(For more on how companies are really doing this, listen to my chat with Ashley Good)

Some of the other subjects we discuss include:

  • How some of these companies handled or weathered different crises much better than others
  • Tight coupling and small failures
  • Cognitive biases and meltdowns
  • Black Swan events—and finding the feathers
  • Pre-established criteria in decision making
  • The value of dissent
  • Power cues—including a fascinating example Chris gives of a study they did with physicians’ around body language with patients;
  • The S.P.I.E.S Tool, this goes hand in hand with the Annie Duke episode if you’re curious and want to listen to that regarding Thinking in Bets.

Hopefully, I’ve peaked your curiosity, and you find my conversation with Chris as interesting as I did.

Give it a listen.

Resources 

Meltdown, Why Our Systems Fail, and What We Can do About it  by Chris Clearfield and András Tilcsik

System Logic Inc 

Normal Accidents By Charles Perrow

Thinking In Bets By Annie Duke

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *