Our Data Science Research Process
1. Business Context
Any problem demands context from the business. A solution for one industry may not be applicable to another, nor is every business the same. Establishing shared context helps get the project off to the right start.
2. Domain Knowledge Transfer
Businesses are often experts in their own domain. This domain expertise is valuable to help direct future solutions. Literature reviews ensure we start with a reasonable baseline.
3. Problem Definition/Clarification
POCs usually start with a vague idea of what problem they are trying to solve. But the problem definition often changes over time, becoming more concrete, adapting to what is possible given the data.
4. Data Capture/Generation
One thing that makes data science research projects unique is that they often don’t have the right data to begin with. Significant effort is often spent capturing new data that more efficiently solves the problem.
5. Data Exploration and Analysis
In this phase expert data analysts extract knowledge from the data. This often leverages actionable insight and is used to validate whether the solution is viable.
6. Model Development and Evaluation
After the data analysis validates that the idea is sound, an initial phase of model exploration is intended to validate whether the problem can be automated. After this phase the models are analysed to ensure they are performant enough to suggest viability. Note that even at this late stage it is sometimes necessary to revisit the problem definition.
7. Solution Evaluation
In an industrial context, it is important to include a high-level evaluation of how well a solution actually solves the business problem. This is the key issue with academic researchers; their solutions are not pragmatic enough to work in the real world.
8. Reporting
Once models are validated then it’s time to report the results back to the stakeholders. After this phase we often start looking at another problem, or promote it to a fully-fledged machine learning development project.