Machine Learning - Winder.AI Blog

Industrial insight and articles from Winder.AI, focusing on the topic Machine Learning

Subscribe

Organizational Machine Learning: Provenance for Data, Pipelines, and Deployments

Organizational Machine Learning: Provenance for Data, Pipelines, and Deployments

Wed Feb 16, 2022, by Phil Winder, in Machine Learning

When: Wed Feb 16, 2022 at 16:30 UTC Where: Linkedin Events Dr. Phil Winder shares experiences of Winder.AI’s machine learning consulting experience at a variety of large and small organizations. In this talk he focuses on ML tooling, and how provenance, typically thought of as a model deployment problem, can help make the development of machine learning models more repeatable, understandable, and robust. Register Now About This Series Welcome to Winder.

The Value of a Machine Learning Pipeline: Past, Present, and the Future of MLOps With Kubeflow

The Value of a Machine Learning Pipeline: Past, Present, and the Future of MLOps With Kubeflow

Mon Nov 1, 2021, by Phil Winder, in Machine Learning, MLOps

Industrial machine learning consulting projects come in a variety of forms. Sometimes clients ask for exploratory data analysis, to evaluate whether their data can be used to help solve a problem using artificial intelligence. Other times we use machine learning (ML) algorithms to automate decisions and improve efficiencies within a business or product. More recently we’ve refocused on reinforcement learning and customers ask us to help control some complex multi-step process.

Fast Time-Series Filters in Python

Fast Time-Series Filters in Python

Fri Oct 4, 2019, by phil-winder, in Machine Learning

Time-series (TS) filters are often used in digital signal processing for distributed acoustic sensing (DAS). The goal is to remove a subset of frequencies from a digitised TS signal. To filter a signal you must touch all of the data and perform a convolution. This is a slow process when you have a large amount of data. The purpose of this post is to investigate which filters are fastest in Python.

Scikit Learn to Pandas: Data types shouldn't be this hard

Sun Feb 3, 2019, by Phil Winder, in Machine Learning

Nearly everyone using Python for Data Science has used or is using the Pandas Data Analysis/Preprocessing library. It is as much of a mainstay as Scikit-Learn. Despite this, one continuing bugbear is the different core data types used by each: pandas.DataFrame and np.array. Wouldn’t it be great if we didn’t have to worry about converting DataFrames to numpy types and back again? Yes, it would. Step forward Scikit Pandas. Sklearn Pandas Sklearn Pandas, part of the Scikit Contrib package, adds some syntactic sugar to use Dataframes in sklearn pipelines and back again.

Principal Component Analysis

Sun Jan 28, 2018, in Machine Learning, Workshop

Dimensionality Reduction - Principal Component Analysis Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Sometimes data has redundant dimensions. For example, when predicting weight from height data you would expect that information about their eye colour provides no predictive power. In this simple case we can simply remove that feature from the data. With more complex data it is usual to have combinations of features that provide predictive power.

101: Why Data Science?

Mon Jan 1, 2018, in Training, Machine Learning, Data Science

What is Data Science? Software Engineering, Maths, Automation, Data A.k.a: Machine Learning, AI, Big Data, etc. It’s current rise in popularity is due to more data and more computing power. For more information: https://winderresearch.com/what-is-data-science/ Examples US Supermarket Giants Target: Optimising Marketing using customer spending data. Walmart: Predicting demand ahead of a natural disaster. Discovery Most projects are “Discovery Projects”.

102: How to do a Data Science Project

Mon Jan 1, 2018, in Training, Machine Learning, Data Science

Problems in Data Science Understanding the problem “the five-whys” Different questions dramatically effect the tools and techniques used to solve the problem. Data Science as a Process More Science than Engineering High risk High reward Difficult Unpredictable By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons Impacts of Data Science What is the purpose of the project?

201: Basics and Terminology

Mon Jan 1, 2018, in Training, Machine Learning, Data Science

The ultimate goal First lets discuss what the goal is. What is the goal? The goal is to make a decision or a prediction Based upon what? Information How can we improve the quality of the decision or prediction? The quality of the solution is defined by the certainty represented by the information. Think about this for a moment. It’s a key insight. Think about your projects.

202: Segmentation For Classification

Mon Jan 1, 2018, in Training, Machine Learning

Segmentation So let’s walk through a very visual, intuitive example to help describe what all data science algorithms are trying to do. This will seem quite complicated if you’ve never done anything like this before. That’s ok! I want to do this to show you that all algorithms that you’ve every heard of have some very basic assumption of what they are trying to do. At the end of this, we will have completely derived one very important type of classifier.

203: Examples and Decision Trees

Mon Jan 1, 2018, in Training, Machine Learning

Example: Segmentation via Information Gain There’s a fairly famous dataset called the “mushroom dataset”. It describes whether mushrooms are edible or not, depending on an array of features. The nice thing about this dataset is that the features are all catagorical. So we can go through and segment the data for each value in a feature. This is some example data: poisonous cap-shape cap-surface cap-color bruises? p x s n t e x s y t e b s w t p x y w t e x s g f etc.