102: How to do a Data Science Project
Problems in Data Science
Understanding the problem
“the five-whys”
Different questions dramatically effect the tools and techniques used to solve the problem.
Data Science as a Process
- More Science than Engineering
- High risk
- High reward
- Difficult
- Unpredictable
By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons
Impacts of Data Science
What is the purpose of the project? Who is affected?
Which parts of the business are affected? Do we need help?
You must think about the human concerns.
You need buy-in from the business; the business will be affected.
Conclusions
Business goals: make money, save money or save time. Data Science generates profit.
Project justifications - you now know how they are judged:
- Alignment with Business Goals
- A well defined, testable requirement
- A robust plan
- Data Understanding
- Data Preparation
- Modelling
- Evaluation
- Deployment
- Iteration of the above
- Buy-in and integration with other parts of the business
However, there are more philanthropic, scientific reasons for undertaking a project too. So these arguments may not directly apply to charitable causes or academia.