Data Science - Winder.AI Blog

Industrial insight and articles from Winder.AI, focusing on the topic Data Science

Subscribe

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - NDC London

Thu Jan 30, 2020, in Data Science, Talk

Slides Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies.

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Berlin

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Berlin

Thu Oct 24, 2019, by Phil Winder, in Talk, Data Science

Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies. Bad data also contributes to the failure of many Data Science projects.

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Chicago

Tue Apr 30, 2019, in Data Science, Talk

Slides Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies. Bad data also contributes to the failure of many Data Science projects.

Google Releases AI Platform with help from Winder.AI

Google Releases AI Platform with help from Winder.AI

Fri Apr 12, 2019, by Phil Winder, in Data Science, Case Study

At their Cloud’s Next 19 conference, Google has announced the launch of an expanded AI platform. For a number of years Google has been expanding it’s portfolio to compete with AI products from Azure and AWS. But this is the first time that the platform can be considered “end-to-end”.

DevOps and Data Science: DataDevOps?

Thu Mar 28, 2019, by Phil Winder, in Data Science, MLOps

I’ve seen a few posts recently about the emergence of a new field that is kind of like DevOps, but not quite, because it involves too much data. Verbally, about two years ago and in blog form about a year ago, I used the word DataDevOps, because that’s what I did. I develop and operate Data Science platforms, products and services. But more recently I have read of the emergence of DataOps.

Using Data Science to block hackers

Using Data Science to block hackers

Sun Oct 28, 2018, by Phil Winder, in Case Study, Data Science

Executive Summary Winder.AI was engaged by Bitsensor to research and implement Data Science algorithms that could automate the detection and classification of web attackers. After gathering data, researching a Machine Learning solution and implementing Cloud-Native software, we delivered three new features: Tool classification - detect which automated tools were being used to perform the attack Attacker grouping - provide the capability of detecting distributed attacks by the same attacker Killchain classification - establish the phase of an attack (e.

Cloud Native Data Science: Best Practices

Cloud Native Data Science: Best Practices

Fri May 4, 2018, by Phil Winder, in Data Science, Cloud Native

Following the Cloud Native best practices of immutability, automation and provenance will serve you well in a CNDS project. But working with data brings its own subtle challenges around these themes.

Cloud Native Data Science: Technology

Cloud Native Data Science: Technology

Thu May 3, 2018, by Phil Winder, in Data Science, Cloud Native

Technology choices in data-driven products are, as you would expect, largely directed by the type and amount of data. The first and most crucial decision to make is whether the data will be processed in a batch or streaming fashion.

Cloud Native Data Science: Strategy

Cloud Native Data Science: Strategy

Wed May 2, 2018, by Phil Winder, in Data Science, Cloud Native, Strategy

Data Science has become an important part of any business because it provides a competitive advantage. Very early on, Amazon’s data on book purchases allowed them to deliver personalised recommendations whilst customers were browsing their site. Their main competitor in the US at the time was Borders, who mainly operated in physical stores. This physicality prevented them from seamlessly providing customers with personalised recommendations [1]. This example highlights how strategic business decisions and data science are inextricably linked.

Life and Death Decisions: Testing Data Science

Wed Apr 25, 2018, in Data Science, Talk

Abstract We live in a world where decisions are being made by software. From mortgage applications to driverless vehicles, the results can be life-changing. But the benefits of automation are clear. If businesses use data science to automate decisions they will become more productive and more profitable. So the question becomes: how can we be sure that these algorithms make the best decisions? How can we prove that an autonomous vehicle will make the right decision when life depends on it?