Root Cause Analysis: The 5-Whys

Root Cause Analysis: The 5-Whys

Deciding what problem you should try and solve is one of the hardest steps to get right in Data Science. If you get it wrong, then you’ll spend significant amounts of time free wheeling around the rest of the data science process and end up with something that nobody wants or cares about. There is nothing worse that someone suggesting that your work has no value. The solution is to get the correct problem defined at the start.

One way to do that is to use one of several root cause analysis techniques. The simplest and one of the most effective in your day-to-day work is to ask “why?” five times.

Each time you ask, “yeah, but why?” you move down another level towards the root cause. Let’s take an example.

Client: “I want to fix our sales pipeline. Can you do it?”

You: “Sure thing! But why do you have problems with your sales pipeline?”

Client: “Because we’ve just released a new product and nobody is buying it.”

You: “Why is nobody buying it?”

Client: “I don’t know, that’s your job!”

You: “Sure, but I’m trying to figure out what you have, or haven’t done. Why is nobody buying it?”

Client: “Well, people don’t yet know that our product exists.” You: “And why don’t they know yet?”

Client: “Because we haven’t really started doing any marketing yet.”

In this fictitious, but fairly common scenario, the reason why this client’s sales pipeline was empty was because nobody knew about their product. It took several why’s to get to the root cause of the sales problem, which turned out (as it often does) to be a marketing problem.

It’s called 5-Whys because there is anecdotal evidence that it often takes 5 questions to get to the root cause. It originated in lean manufacturing but has widespread applicability to any domain.

However, critics suggest that the method is too simplistic. Often there are multiple root causes (e.g. lack of marketing AND no market for the product) and the 5-whys gives a false sense of security by suggesting only one. And it is unable to probe deeper than the clients level of knowledge. You’re assuming that deep-down the client really knows what’s wrong, they just don’t realise it. Obviously this is not always true.

But personally, I find it very useful for focusing my efforts to find a suitable problem to solve. Try it next time you start a project. Really as why you are doing a project. Get to the root cause of the problem.

More articles

Detrending Seasonal Data

A quick Python Notebook to show you how to use statsmodels to detrend seasonal data.

Read more

Evidence, Probabilities and Naive Bayes

Probabilistic models are great at promoting good science. I.e. we're trying to model features to predict outputs. In this Python Notebook you will learn how to calculate bayes rule and use a naive bayes classifier.

Read more
}