102: How to Do a Data Science Project

In this video we will talk about the problems encountered in data science. We'll also discover how it fits into a process, which you can used as a plan. Finally, we'll look at the impacts of a Data Science project which will help you avoid any common pitfalls.


Problems in Data Science

  • Understanding the problem

  • “the five-whys”

  • Different questions dramatically effect the tools and techniques used to solve the problem.

Data Science as a Process

  • More Science than Engineering

Research Problem Model

  • High risk
  • High reward
  • Difficult
  • Unpredictable

CRISP-DM Process

By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons

Impacts of Data Science

  • What is the purpose of the project? Who is affected?

  • Which parts of the business are affected? Do we need help?

  • You must think about the human concerns.

  • You need buy-in from the business; the business will be affected.


Business goals: make money, save money or save time. Data Science generates profit.

Project justifications - you now know how they are judged:

  • Alignment with Business Goals
  • A well defined, testable requirement
  • A robust plan
    • Data Understanding
    • Data Preparation
    • Modelling
    • Evaluation
    • Deployment
    • Iteration of the above
  • Buy-in and integration with other parts of the business

However, there are more philanthropic, scientific reasons for undertaking a project too. So these arguments may not directly apply to charitable causes or academia.