No, Seriously. Root Cause Is a Fallacy

zug_zug · on April 8, 2018

This is a deliberately click-bait title.

It intentionally misuses the word fallacy, when it actually means that root cause analysis is an essential step in the correct direction and must go even further in that direction. All it's really saying is "cause should be plural in root cause analysis."

The fact that perhaps an incident had multiple simultaneous causes doesn't invalidate root-cause-analysis. The fact that 5-whys diverge after level 3 or 4 means that once you get to to a high-enough level there are numerous solutions on how to prevent a problem (to use the car-battery example, there are dozens of solutions. The car could show a message when the alternator doesn't work, the dealer could call the car-owner at service times, they could invent a belt that doesn't ever break, etc)

bayonetz · on April 8, 2018

Yep, this article’s framing is poor. The 5 Whys are just a simplification that proved useful to get people to dig deeper that previously couldn’t be bothered. There is nothing stopping you from turning a linear sequence of questions into a fanning-out tree of questions so that you can go both deep and wide. ...well nothing other than time and resources. It’s like suggesting that riding bicycles are a fallacy and that if you really want to get somewhere fast, you need to take an airplane. It’s true that bicycles are slower but they are usually better than walking and the cost/benefit ratio is often better than taking a flight.

bandrami · on April 8, 2018

Aristotle's distinctions among types of causes are, I think, very useful compared to root cause analysis. It's not that some why's are more important than others in an absolute sense. Rather than having four different "why?" questions one after the other, he pointed out there are four kinds of "why?" you can ask simultaneously, and depending on what your needs are one may be more important than another at a given time.

So, for "why is the table flat?", the four causes are:

1. (Material cause) because it is made of wood, and wood is rigid, and so holds its shape.

2. (Efficient cause) because a carpenter took a plane or chisel and cut away all the wood that wasn't the flat surface.

3. (Formal cause) because the blueprint the carpenter used was of a flat rigid surface suspended over 4 legs.

4. (Final cause) because if it were not flat, your jug of wine would fall over when you set it down on it.

This is in some ways superficially similar to the "five why's", but it is more flexible, because you do not have to go in that order. You might look at two tables and ask why this one is flat and this other one is curved, and realize that it's because this other one is made of canvas rather than wood, and the canvas has started sagging. This framework also lets you categorize each type of "because" as you receive it.

jspaw · on April 8, 2018

A level floor doesn’t have influence on the table being flat?

Does the observer’s perspective being parallel with the table’s surface have anything to do with it?

When viewed through a microscope, is it “flat”?

Cause is constructed not objectively and comprehensively identified.

w4tson · on April 8, 2018

This is cool. Where did you learn this?

bandrami · on April 8, 2018

I majored in classical philosophy.

This is actually a fairly common background for sysadmins, I've found, anecdotally.

gumby · on April 8, 2018

This article could better have been entitled, "Root cause is overrated". It's not always the right way to address a problem. (a good example is modern medicine which is almost always about addressing symptoms).

The best part of this article is "5W tends to diverge after the second or third question."

zug_zug · on April 8, 2018

Divergence isn't a problem though. Divergence shows how complex cause and effect are, and how many different ways there are to prevent a problem.

If it diverges then people simply need to get in a room and list the tradeoffs of different solutions (robustness, cost, reusability)

drawkbox · on April 8, 2018

> One root cause implies one problem with one answer.

I don't know that root cause was ever ONE thing. The root cause(s) can be many roots within the root cause.

A tree has a trunk, it goes down to the root, the roots are multiple on some instances.

The 5 Whys, 5-Ms, 8-Ps and 4-Ss help determine root cause(s). The idea of root cause is to find all the real reasons something isn't going as planned. Many times it is one thing but sometimes it isn't, there is also process that need retrospectives if the root causes are not fully solved or continue to happen, in that case the true root cause is still out there. When you only find part of the root cause you haven't found the root cause fully.

Root cause is not a fallacy, if it isn't just one thing you just haven't found the root cause(s) fully yet just a symptom and you need to go deeper or the roots have grown/changed.

In logic, root cause is usually easier to find because it is more objective, in policy there is a constant of misunderstanding that is rarely factored in enough or is caused by unknowns that can be ever-changing.

Sometimes, especially in policy, the cause can never be solved and it should be accepted as a known part of the solution or accepted as a constant problem. Example: to stop drug addition we will make all drugs illegal, or to stop alcohol addiction we will prohibit alcohol, yet people will continue to do them because of the human condition. These are clear examples of root causes that have deeper root causes, some causes don't need solving or can't be solved due to misunderstanding, they need objective acceptance of them being part of the system always and solutions within that reality. Another example is hacking or fraud, those expectations have to be built into the system, they aren't going away because they are evolving around the root cause solutions put in place into a cat/mouse game ad infinitum.

ademup · on April 8, 2018

Provacative title, but I am unconvinced. The 5W example seems completely valid as it stands and author's attempt only leads to (to my mind ) another valid version of 5W. Even if the article had won me over, I would change the title to 'Root Cause can sometimes provide insufficient results'

That said, I do agree with the author on one thing: gif with a soft 'g'.

kirykl · on April 8, 2018

The fish bone diagram in addition to 5W can help account for multiple factors of root cause https://en.m.wikipedia.org/wiki/Ishikawa_diagram

jspaw · on April 8, 2018

No, they can’t. Both approaches are linear and sequential chains of cause/effect, which do not work with complex adaptive systems.

Five whys give a cherry-picked paucity of data in an investigation: https://www.oreilly.com/ideas/the-infinite-hows

Joeri · on April 8, 2018

That five why's ends up with a different resolution depending on who you ask doesn't need to be a problem. You're not trying to fix all causes, just avoid the specific problem you had from reoccurring. All you need to find is _a_ cause that when addressed will ensure the problem does not reappear in the same form. In fact, it is better if multiple independent causes can be found because then multiple causes can be addressed.

zzzcpan · on April 8, 2018

Avoiding specific problem from reoccurring is not the goal either though. What is the root goal? Why do you want to prevent specific problem from reoccurring? Maybe you just want to limit the scope of the problem or of all such problems and maybe it will have much better overall effect on the system.

I think the point author tries to make is that root cause analysis is a broken thinking model that doesn't lead to actual quality improvement.

Rapzid · on April 8, 2018

Agreed; That leads to the quick fix that may prevent a specific incarnation of a problem, but often it does nothing to advance mitigation against entire classes of problems. It often in fact increases technical debt. How someone approaches bug fixes is a good filter to separate the wheat from the chaff.

konschubert · on April 8, 2018

For me, when I talk about "root cause", what I mean is:

What is the most generic thing, that, if changed, would have prevented that problem?

Generic things are: Team setup, planning process, development process, choice of framework and programming languages, coding conventions, ...

Less generic things are: Database constraints, ORM validation, ...

Even less generic: Server side API validation, frontend validation, a frontend bug that caused a bad value to be submitted, ...

From this comparison it is clear that all levels of genericness have to be correct and need to be fixed if necessary. The question for the root cause simply encourages to follow the lead all the way up the chain of genericness.

eesmith · on April 8, 2018

The example I give of a problem in 5-Whys analysis deals with Snowden's leak of classified materials.

One RCA ends up dealing with how computer system security doesn't match up with organizational security, another ends up with the problems of outsourcing security, a third concerns the dissonance between internal and public views of American military policy, and so on.

Every single one ends up with a very different resolution. As the article quotes, "5W tends to diverge after the second or third question."

sokoloff · on April 8, 2018

We saw some convergence on the 4th or 5th why frequently being “Because we were lazy...”

draugadrotten · on April 8, 2018

The purpose of a root cause analysis is often not to find the {only,all} root cause, but to prevent the incident at hand from repeating by removing something which is necessary.

Event Y happens if and only if factor X. Factor X is what the RCA is supposed to find to prevent event Y. There may be many factors which satisfy this equation which means there may be multiple RCAs which are correct and functional in achieving our goal, to prevent event Y.

markonen · on April 8, 2018

"What's the root cause of success?" really is a nice and succinct way of illustrating the problem with root cause analysis.

Of course, a million business book writers fancy themselves capable of answering that question, in about 300 pages or so.

phkahler · on April 8, 2018

>> What's the root cause of success?" really is a nice and succinct way of illustrating the problem with root cause analysis.

No it's not. Success is usually the result of a lot of things going right. The implication is that root cause analysis is like asking why a new business failed vs succeeded. That's what a post-mortem is for. Root Cause Analysis is primarily used to find the cause of a failure in something that is otherwise working normally. Why did the rocket explode? Why did the aircraft go down? Why did the car have sudden uncontrolled acceleration? The results of an RCA are specific and actionable things that can prevent future occurrences. There may be multiple "contributing factors" discovered along the way, and it's sensible to address those as well. Nobody says to ignore all contributing factors along the way.

The author also takes the failed car example and proposes a bunch of things that were not the problem and acts as if they were overlooked in asking 5 whys. Each of the why questions had a specific answer in the example, but there may have been other things looked at prior to finding each answer. He suggests that other things may have been ignored while overlooking the specific actionable items that were identified.

zzzcpan · on April 8, 2018

Ultimately the role of root cause analysis is to help reach said success. There is no other reason to adopt this technique, hence the fallacy.

URSpider94 · on April 8, 2018

No. That’s like saying one could conduct a root cause analysis on a pile of parts on the ground, asking why they haven’t formed themselves into a car.

Root cause analysis, by its definition, assumes that you are starting with a functioning system (turns inputs into the proper outputs) that failed unexpectedly.

zzzcpan · on April 8, 2018

Functioning system doesn't imply success. You believe you can reach success by adopting certain techniques for solving problems. Doesn't mean they are net positive for it though.

phkahler · on April 9, 2018

>> Functioning system doesn't imply success.

It is almost a definition of success. One can refine it to include functioning under a range of conditions, or at a desired price point, or something else. But a system that functions as desired is the end goal of all engineering projects. Even if you're most of the way there in development, root cause analysis can often be used to figure out why something did not perform as desired.

icegreentea2 · on April 8, 2018

I think root cause analysis is perfectly capable of scaling to the type of system that we're talking about here. It just means you need to iterate and backtrack. Maybe consider something like fault tree analysis or whatever, but tailored for your system.

Yeah, we get it, complex systems fail due to multiple failure points. So iterate your root cause analysis until you find multiple failure points.

The entire point of root cause analysis is to stop from mitigating the wrong things without thinking. Where you mitigate is purely a business value decision. Sometimes mitigating at a higher level (either by stopping your root cause analysis early, or by deciding that mitigating at a lower level is not possible or desirable) is the way to go. Case in point, the root cause of my car (I live in Canada) rusting out early is cause we salt the shit of our roads every winter and that my car is made of corrodable materials. I mitigate that by applying undercoating that I have to reapply, instead of addressing the root cause (I live in a terrible climate, and my car is made of metal).

Having the causal tree of events is only useful when paired with the set of possible solutions.

mcqueenjordan · on April 8, 2018

I find the arguments weak and lacking convincing power. Each has a pretty obvious gaping flaw. The author seems to depend upon RCA being executed poorly. For instance, “we take short-cuts and the investigation for the root cause is shallow.” If you begin with the assumption that all investigations are shallow and bad, well of course you’re going to conclude that RCA is bad.

Title claims it’s a fallacy but all the fallacious thinking I see is in its arguments.

tyldum · on April 8, 2018

I tend to aim for two solutions: how can we prevent this from happening and how can we detect this kind of failure ahead of time (to alert).

j_m_b · on April 8, 2018

'No. Wrong.' Such binary thinking. Reminds me of the quote "Google uses Bayesian filtering the way Microsoft uses the if statement" which points out how unsophisticated purely binary statements are. The Jain's captured this idea in the philosophy of Anekāntavāda.. that reality is multi-faceted. The philosophy teaches one to approach a situation and make statements such as "in some ways, it is","in some ways, it is not","in some ways, it is, and it is not"...

toss1 · on April 8, 2018

This article explains nicely why, despite a strong bias to seek deep causes, I've always felt skeptical of formal RCA.

It's the formality and constraints. The key is in the example of the '5 Whys' exercise diverging after 2-3 steps if people do it individually & separately. It is simply too confining to seek a single cause -- occasionally 'true' but rarely complete.

Seems a good first improvement could be to implement independent/separate 5W analysis as an early step, then use those as a broader roadmap.

chacham15 · on April 8, 2018

The problem with the analysis that he is talking about is that it tends to lead to too many problems. Is this situation worth that amount of effort to fix? If it is, then when we would have done an RCA we would have looked for those additional measures, if not we've saved ourselves a lot of time and argument about whether or not something contributed to the problem.

karmakaze · on April 8, 2018

My problem with root cause thinking is the belief that there is only one thing that needs fixing when there are many contributing factors. Fix one thing and you have a system that's just stable enough, not more.

ybrah · on April 8, 2018

No, seriously. Thing is a thing.