That being said, let’s go learn something new, and pick up where we paused.
Before you even start: The rules of the Fight Clu… I mean RCA
RCA is like a Fight Club (you remember the movie, right?) – and has rules.
The first rule of RCA is… whatever happened, it is never human fault.
The second rule of RCA is… whatever happened, IT IS NEVER HUMAN FAULT!
The third rule of RCA is… whatever happened, it is always company fault.
The reason is simple, yet often overlooked. The company owns all the means, tools and resources necessary for improvement. As a matter of fact, employees have nothing but the frontline knowledge. So if there is something missing along the process – the company is the one to turn to.
Imagine that in hypothetical situation, some random employee contributed partially or entirely to some sort of failure. And Root Cause Analysis indicates this Human Fault as a Root Cause.
RCA is intended to improve or remove the Root Causes. If the person would be Root Cause – in best-case scenario this employee would be sent on some kind of training. still, is now even more than before person-dependent risk “mitigation”.
In worst-case scenario this employee would be fired. The person that has the most knowledge and experience is now gone, but the problem remains the same. And trying to avoid the Root Causes will be like this. Because there would be nothing protecting company from organizational point of view.
Additionally, this would lead to one 100% certain result. Organization would lose opportunity to conduct another RCA, like forever. Probably just like all other inspect and adapt methods. Not to mention, that RCA driver would be most hated person in the organization. Well deserved, to be honest.
Build trust. And protect this trust.
Root Cause Analysis requires TRUST. So the RCA Driver (or whatever you call that person) needs to ensure that management knows, understands and aligns with the three rules of an RCA. And the RCA Driver needs to build and maintain the trust within the team along the entire process.
Trust allows people to open up. It is stunning, how much details share people who trust, that their insights will clearly serve the improvement and nobody will be harmed in result. They without a doubt discuss the different viewpoints, correct each other, and fill white spots in analysis.
Furthermore, awareness of influence on the situation, conviction that management is really, REALLY willing to openly analyze the situation and adjust accordingly, makes people committed to finding the right answers.
And they really do commit. And do not back down from admitting their mistakes. Because they feel safe, and know is has purpose.
So as an effect of the honest analysis done by committed team of experts engaged in the situation results come in the end. And are ofter eye-opening and jaw-dropping.
Some readers probably recognize this pattern – it’s pyramid of 5 dysfunctions of a team, but in positive. People who trust – speak up and really discuss. Not only they commit to find the answers and solutions. They feel accountable – having in mind, that organization invests a lot in getting better – and they are the ones who can contribute the most.
4 steps of the effective Root Cause Analysis
Root Cause Analysis has four steps, and they come in order.
- Define the SPECIFIC problem. (#focus)
Why the VERY LAST release failed. If we analyze why multiple our releases suck, we can dilute the analysis. In reality, some findings might be applicable just to particular one, etc.
- Gather Data and perform HONEST analysis (#courage)
Now it’s definitely time to do the dirty laundry. All corner-cutting, company politics, improper behaviours, lack of procedures, deadline worshipping – get ready to dig for it.
- Formulate RELEVANT improvements (#openness)
Weeeeell, it’s great temptation to pursue some (most probably) improvements long-forgotten in the backlog under the blanket of RCA, but we need to be crystal clear, and focus on what REALLY matters in that particular situation.
- Make a PLAN and IMPLEMENT those improvements (#commitment)
As mentioned before – even the best analysis without following improvements is pointless.
We look for Root Causes. When ”root cause” is a root cause?
First of all – it is something tangible and concrete.
“POOR COMMUNICATION” is not a (good) Root Cause. “Lack of status syncs during the 2-week period before the release” is much better. We can do something addressing that directly.
Second – it is under control of the Company.
Lightning strike burning our main server causing downtime is not a root cause (we have limited influence on thunderstorm). However, lack of proper anti-surge mechanisms in our network, lack of BCM (Business Continuity Management) plan, or simply missing lightning rod – is.
It actually IS a reason of failure.
We actually can re-create the cause and effect relationship between them.
Two things worth mentioning. Root Cause is usually pretty simple (much simpler than final effect(s)), therefore often overlooked. Second, the Root Causes are like wolves – they usually emerge in packs 🙂
Summary of this part
Again, it is time to stop the unravelling the Root Cause Analysis theory, and get some fresh air before we lean into final part.
You are familiar with RCA rules now. With this in mind, you know why you should never allow to even thing of human fault(s). You know what are the steps of an effective RCA, and you are able to identify the real Root Causes.
So see you in the last part of this mini series 🙂