102: How to do a Data Science Project

Problems in Data Science

  • Understanding the problem

  • “the five-whys”

  • Different questions dramatically effect the tools and techniques used to solve the problem.

Data Science as a Process

  • More Science than Engineering
Research Problem Model

  • High risk
  • High reward
  • Difficult
  • Unpredictable

CRISP-DM Process

By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons

Impacts of Data Science

  • What is the purpose of the project? Who is affected?

  • Which parts of the business are affected? Do we need help?

  • You must think about the human concerns.

  • You need buy-in from the business; the business will be affected.


Business goals: make money, save money or save time. Data Science generates profit.

Project justifications - you now know how they are judged:

  • Alignment with Business Goals
  • A well defined, testable requirement
  • A robust plan
    • Data Understanding
    • Data Preparation
    • Modelling
    • Evaluation
    • Deployment
    • Iteration of the above
  • Buy-in and integration with other parts of the business

However, there are more philanthropic, scientific reasons for undertaking a project too. So these arguments may not directly apply to charitable causes or academia.

More articles

Scaling StableAudio.com Generative Models Globally with NVIDIA Triton & Sagemaker

Learn from the trials and tribulations of scaling audio diffusion models with NVIDIA's Triton Inference Server and AWS Sagemaker.

Read more

Big Data in LLMs with Retrieval-Augmented Generation (RAG)

Explore how Retrieval-Augmented Generation (RAG) enhances Language Models by utilizing indexing, retrieval, and generation for up-to-date data access.

Read more