Winder.AI Blog

Industrial AI insight about machine learning, reinforcement learning, MLOps, and more...

Subscribe

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Berlin

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Berlin

Thu Oct 24, 2019, by Phil Winder, in Talk, Data Science

Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies. Bad data also contributes to the failure of many Data Science projects.

Fast Time-Series Filters in Python

Fast Time-Series Filters in Python

Fri Oct 4, 2019, by phil-winder, in Machine Learning

Time-series (TS) filters are often used in digital signal processing for distributed acoustic sensing (DAS). The goal is to remove a subset of frequencies from a digitised TS signal. To filter a signal you must touch all of the data and perform a convolution. This is a slow process when you have a large amount of data. The purpose of this post is to investigate which filters are fastest in Python.

A Comparison of Reinforcement Learning Frameworks: Dopamine, RLLib, Keras-RL, Coach, TRFL, Tensorforce, Coach and more

Mon Jul 1, 2019, by Phil Winder, in Reinforcement Learning

Reinforcement Learning (RL) frameworks help engineers by creating higher level abstractions of the core components of an RL algorithm. This makes code easier to develop, easier to read and improves efficiency.

But choosing a framework introduces some amount of lock in. An investment in learning and using a framework can make it hard to break away. This is just like when you decide which pub to visit. It’s very difficult not to buy a beer, no matter how bad the place is.

Announcement: New Reinforcement Learning Book with O'Reilly

Announcement: New Reinforcement Learning Book with O'Reilly

Tue Jun 25, 2019, by phil-winder, in Reinforcement Learning

I’m excited to announce that I have agreed with O’Reilly Media to write a new book on Reinforcement Learning. The contracts have just been signed and I’ve started the writing process. It is likely to take around a year to be released so I’m hoping that it will be ready around Summer 2020.

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Chicago

Tue Apr 30, 2019, in Data Science, Talk

Slides Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies. Bad data also contributes to the failure of many Data Science projects.

Google Releases AI Platform with help from Winder.AI

Google Releases AI Platform with help from Winder.AI

Fri Apr 12, 2019, by Phil Winder, in Data Science, Case Study

At their Cloud’s Next 19 conference, Google has announced the launch of an expanded AI platform. For a number of years Google has been expanding it’s portfolio to compete with AI products from Azure and AWS. But this is the first time that the platform can be considered “end-to-end”.

DevOps and Data Science: DataDevOps?

Thu Mar 28, 2019, by Phil Winder, in Data Science, MLOps

I’ve seen a few posts recently about the emergence of a new field that is kind of like DevOps, but not quite, because it involves too much data. Verbally, about two years ago and in blog form about a year ago, I used the word DataDevOps, because that’s what I did. I develop and operate Data Science platforms, products and services. But more recently I have read of the emergence of DataOps.

Local Jenkins Development Environment on Minikube on OSX

Mon Mar 11, 2019, by Phil Winder, in Software Engineering, Cloud Native

Developing Jenkinsfile pipelines is hard. I think my world record for the number of attempts to get a working Jenkinsfile is around 20. When you have to continually push and run your pipeline on a managed Jenkins instance, the feedback cycle is long. And the primary bottleneck to developer productivity is the length of the feedback cycle.

Scikit Learn to Pandas: Data types shouldn't be this hard

Sun Feb 3, 2019, by Phil Winder, in Machine Learning

Nearly everyone using Python for Data Science has used or is using the Pandas Data Analysis/Preprocessing library. It is as much of a mainstay as Scikit-Learn. Despite this, one continuing bugbear is the different core data types used by each: pandas.DataFrame and np.array. Wouldn’t it be great if we didn’t have to worry about converting DataFrames to numpy types and back again? Yes, it would. Step forward Scikit Pandas. Sklearn Pandas Sklearn Pandas, part of the Scikit Contrib package, adds some syntactic sugar to use Dataframes in sklearn pipelines and back again.

7 Reasons Why You Shouldn't Use Helm in Production

Mon Jan 14, 2019, by Phil Winder, in Cloud Native

Helm is billed as “the package manager for Kubernetes”. The goal was to provide a high-level package management-like experience for Kubernetes. This was a goal for all the major containerisation platforms. For example, Apache Mesos has Mesos Frameworks. And given the standardisation on package management at an OS level (yum, apt-get, brew, choco, etc.) and an application level (npm, pip, gem, etc.), this makes total sense, right?