Keep it Clean: Why Bad Data Ruins Projects and How to Fix it

Published
Author
Dr. Phil Winder
CEO

Slides Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies. Bad data also contributes to the failure of many Data Science projects.

Read more

Google Releases AI Platform with help from Winder.AI

Published
Author
Dr. Phil Winder
CEO

At their Cloud’s Next 19 conference, Google has announced the launch of an expanded AI platform. For a number of years Google has been expanding it’s portfolio to compete with AI products from Azure and AWS. But this is the first time that the platform can be considered “end-to-end”.

Read more

DevOps and Data Science: DataDevOps?

Published
Author
Dr. Phil Winder
CEO

I’ve seen a few posts recently about the emergence of a new field that is kind of like DevOps, but not quite, because it involves too much data. Verbally, about two years ago and in blog form about a year ago, I used the word DataDevOps, because that’s what I did. I develop and operate Data Science platforms, products and services. But more recently I have read of the emergence of DataOps.

Read more

Using Data Science to block hackers

Published
Author
Dr. Phil Winder
CEO

Executive Summary Winder.AI was engaged by Bitsensor to research and implement Data Science algorithms that could automate the detection and classification of web attackers. After gathering data, researching a Machine Learning solution and implementing Cloud-Native software, we delivered three new features: Tool classification - detect which automated tools were being used to perform the attack Attacker grouping - provide the capability of detecting distributed attacks by the same attacker Killchain classification - establish the phase of an attack (e.

Read more

Cloud Native Data Science: Best Practices

Published
Author
Dr. Phil Winder
CEO

Following the Cloud Native best practices of immutability, automation and provenance will serve you well in a CNDS project. But working with data brings its own subtle challenges around these themes.

Read more

Cloud Native Data Science: Technology

Published
Author
Dr. Phil Winder
CEO

Technology choices in data-driven products are, as you would expect, largely directed by the type and amount of data. The first and most crucial decision to make is whether the data will be processed in a batch or streaming fashion.

Read more

Cloud Native Data Science: Strategy

Published
Author
Dr. Phil Winder
CEO

Data Science has become an important part of any business because it provides a competitive advantage. Very early on, Amazon’s data on book purchases allowed them to deliver personalised recommendations whilst customers were browsing their site. Their main competitor in the US at the time was Borders, who mainly operated in physical stores. This physicality prevented them from seamlessly providing customers with personalised recommendations [1]. This example highlights how strategic business decisions and data science are inextricably linked.

Read more

Life and Death Decisions: Testing Data Science

Published
Author

Abstract We live in a world where decisions are being made by software. From mortgage applications to driverless vehicles, the results can be life-changing. But the benefits of automation are clear. If businesses use data science to automate decisions they will become more productive and more profitable. So the question becomes: how can we be sure that these algorithms make the best decisions? How can we prove that an autonomous vehicle will make the right decision when life depends on it?

Read more

201: Basics and Terminology

Published
Author

The ultimate goal First lets discuss what the goal is. What is the goal? The goal is to make a decision or a prediction Based upon what? Information How can we improve the quality of the decision or prediction? The quality of the solution is defined by the certainty represented by the information. Think about this for a moment. It’s a key insight. Think about your projects. Your research. The decisions you make.

Read more

102: How to do a Data Science Project

Published
Author

Problems in Data Science Understanding the problem “the five-whys” Different questions dramatically effect the tools and techniques used to solve the problem. Data Science as a Process More Science than Engineering High risk High reward Difficult Unpredictable By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons Impacts of Data Science What is the purpose of the project? Who is affected? Which parts of the business are affected? Do we need help?

Read more
}