Winder.AI Blog

Industrial AI insight about machine learning, reinforcement learning, MLOps, and more...

Subscribe

A Simple Docker-Based Workflow for Deploying a Machine Learning Model

A Simple Docker-Based Workflow for Deploying a Machine Learning Model

Fri Apr 24, 2020, by Phil Winder, in MLOps, Cloud Native

In software engineering, the famous quote by Phil Karlton, extended by Martin Fowler goes something like: “There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.” In data science, there’s one hard thing that towers over all other hard things: deployment.

COVID-19 Logistic Bayesian Model

COVID-19 Logistic Bayesian Model

Wed Apr 8, 2020, by phil-winder, in Data Science

This post builds upon the exponential model created in a previous post. The main issue was that there an exponential model does not include a limit. A logistic model introduces this limit. I also perform some very basic backtesting and future prediction.

COVID-19 Exponential Bayesian Model

COVID-19 Exponential Bayesian Model

Wed Apr 8, 2020, by phil-winder, in Data Science

The purposes of this notebook is to provide initial experience with the pymc3 library for the purpose of modeling and forecasting COVID-19 virus summary statistics. This model is very simple, and therefore not very accurate, but serves as a good introduction to the topic.

How to Start a Data Science Project With No or Little Data

How to Start a Data Science Project With No or Little Data

Wed Feb 26, 2020, by Hajar Khizou, in Data Science

Data is an essential asset of modern business. It empowers companies by surfacing unique insights about their customers and creates actionable products. The more data you possess, the better you meet and exceed your customers' expectations.

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - NDC London

Thu Jan 30, 2020, in Data Science, Talk

Slides Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies.

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Berlin

Keep it Clean: Why Bad Data Ruins Projects and How to Fix it - GOTO Berlin

Thu Oct 24, 2019, by Phil Winder, in Talk, Data Science

Abstract The Internet is full of examples of how to train models. But the reality is that industrial projects spend the majority of the time working with data. The largest improvements in performance can often be found through improving the underlying data. Bad data is costing the US economy an estimated 3.1 trillion Dollars and approximately 27% of data is flawed in the world’s top companies. Bad data also contributes to the failure of many Data Science projects.

Fast Time-Series Filters in Python

Fast Time-Series Filters in Python

Fri Oct 4, 2019, by phil-winder, in Machine Learning

Time-series (TS) filters are often used in digital signal processing for distributed acoustic sensing (DAS). The goal is to remove a subset of frequencies from a digitised TS signal. To filter a signal you must touch all of the data and perform a convolution. This is a slow process when you have a large amount of data. The purpose of this post is to investigate which filters are fastest in Python.