102: How to do a Data Science Project

Problems in Data Science

  • Understanding the problem

  • “the five-whys”

  • Different questions dramatically effect the tools and techniques used to solve the problem.


Data Science as a Process

  • More Science than Engineering
Research Problem Model

  • High risk
  • High reward
  • Difficult
  • Unpredictable

CRISP-DM Process

By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons


Impacts of Data Science

  • What is the purpose of the project? Who is affected?

  • Which parts of the business are affected? Do we need help?

  • You must think about the human concerns.

  • You need buy-in from the business; the business will be affected.


Conclusions

Business goals: make money, save money or save time. Data Science generates profit.

Project justifications - you now know how they are judged:

  • Alignment with Business Goals
  • A well defined, testable requirement
  • A robust plan
    • Data Understanding
    • Data Preparation
    • Modelling
    • Evaluation
    • Deployment
    • Iteration of the above
  • Buy-in and integration with other parts of the business

However, there are more philanthropic, scientific reasons for undertaking a project too. So these arguments may not directly apply to charitable causes or academia.

More articles

101: Why Data Science?

This section introduces Data Science. It explains what it is and why we need it. We discuss some of the reasons for doing Data Science and provides famous examples from around the world.

Read more

201: Basics and Terminology

Now we have a firm understanding of how business problems map to solutions we need to learn the techniques to deliver the solutions. This section introduces the basic terminology and concepts used in data science.

Read more
}