How We Work With Cloud-Native Data Science: An Interview With Phil Winder
by Dr. Phil Winder , CEO
How has Cloud-Native Data Science evolved over the years and how could it further evolve?
Data Science (see What is Data Science?), in its most general sense, has been used in industry for decades. But only in the last decade has the amount of data and computational power increased to the point where engineers can create seemingly magical solutions for complex decision-oriented problems.
One of the key drivers is the ability to scale-out computational resources, both to train these complex models and to serve users and customers in an appropriate amount of time. This is where cloud-native technologies help. Cloud-native (see What is Cloud Native?) software is designed in such a way that they can handle any amount of data and any numbers of users, in a cost-effective manner.
Cloud-Native Data Science (see Cloud Native Data Science: Strategy) is a more recent combination of approaches to create intelligence at scale. Open source frameworks such as Kubernetes and Kubeflow, supported by commercial equivalents from companies and cloud providers, have really pushed the boundaries of data science and cloud computing to the point where companies of any size can build production-grade scalable Artificial Intelligence (AI) solutions to benefit their business.
Winder.AI offers concept-to-production implementations of Machine Learning, AI and Cloud-Native applications. Could you take us through a successful case study?
There are two main departments in Winder.AI. One part of the company helps businesses design and build internal data science systems and best practices. We’ve worked with several high-profile enterprise clients to help them build out their data science practices. For example, Neste is a Finnish energy company that wanted to encourage their engineers to improve their data science, so alongside their team we helped them define a data science handbook that spells out what they should be doing at different phases of the product’s life cycle. This simple addition produced spectacular results and cut POC-to-production times in half. We also worked to further refine their data science process with a fully-managed data-science workflow.
Another large enterprise wanted more flexibility, so we were able to architect and build out a Kubernetes-based data science platform. Again, having these systems in place means the company can enforce controls, unify operations and cut project times significantly. In one instance, we had a project that went from concept to production in two weeks, whereas it would have taken six months prior to our work.
The second department works on applications. Here, the team helps our clients design and build data-based products. For example, we helped build parts of Google’s AI Hub, we’ve built a fully ML-driven web application firewall for Bitsensor, designed sophisticated acoustic detection algorithms for DAS companies like Focus Sensors, Frauscher and Optasense, built AI recruitment agents, natural language document searches and much, much more.
You can view in-depth case studies on our website.
How can a company start a Data Science project with no or little data?
If your company is trying to start a Data Science project, the key is expertise. You need to prioritise the problems you are trying to solve by complexity and value and an expert is the best person to help you do that. After defining a solvable problem, which is the number one reason for projects failing, the next hurdle is data. On average, companies are estimated to lose 30% of their revenue due to bad data.
Many projects start in the position where you have little or no data, but you can still provide a lot of value. It is relatively straightforward to augment and bootstrap data to help the ML algorithms but there are a lot of pitfalls that you need to be aware of. One pattern that I often use is that there may be a simpler version of the same problem that you can use your data for, as a stepping-stone to bigger challenges. My colleague Hajar wrote an excellent blog post on this topic which I recommend you read (see How to Start a Data Science Project With No or Little Data).
If you were looking for collaborations and joint-ventures, which companies and industries would you target and why?
The nature of cloud-native data science is that it is truly cross-industry, in the same way, for example, that building a website is. We don’t target specific industries or company sizes. We believe that AI/ML/Data Science has the ability to improve all businesses and we don’t think that anyone should be left out.
When someone asks us for help, we usually perform an assessment to validate at what point of the data journey they are on. Based upon this we would recommend what would be the most valuable next step. For some experienced companies that might be refining how their data scientists build products. For nascent companies, we might suggest that we help them build minimum viable products. For collaborations or joint-ventures, we assimilate ourselves into their team and match their working practices.
How do you see Data Science evolving over the next few years?
Businesses are using data to improve productivity through automation and to differentiate themselves from their competitors. This isn’t going to change any time soon. It will become normal, almost expected, that companies use their data responsibly to improve their products and services. Those that embrace it fully, and with the right help, will improve their chances of success.
Beyond that, recent advancements in technologies such as reinforcement learning (see our book on Reinforcement Learning), show that the future will belong to companies that are able to exploit technology to not only automate decisions, but automate actions too. Imagine a situation where your company is so instrumented, so integrated, that you can trust the automation of strategic-level actions to an algorithm, with the knowledge that it runs without bias or greed, without any of the human traits that can impair decision making. This is the future. It is fueled by data, driven by cloud-native data science and results in highly-productive business.
Credits
This interview was given by the Yorkshire Business Journal.