Machine Learning

Industrial insight and articles from Winder.AI, focusing on the topic Machine Learning

Machine Learning Presentation: Packaging Your Models

Published: Mar 16, 2022
Author: Dr. Phil Winder
CEO

Dr. Phil Winder shares experiences of Winder.AI’s machine learning consulting experience at a variety of large and small organizations. Abstract In this talk he focuses on packaging ML models for production serving. Learn about how the cloud vendors compare, what orchestration abstractions prefer, and how packaging tools seek to find the right abstractions. At the end of the talk Phil distils this information and presets best practices. There’s also some discussion of future trends and some ideas for aspiring open-source engineers.

Machine Learning Presentation: Provenance and Lineage for Data, Pipelines, and Deployments

Published: Feb 16, 2022
Author: Dr. Phil Winder
CEO

Dr. Phil Winder shares experiences of Winder.AI’s machine learning consulting experience at a variety of large and small organizations. Abstract In this talk he focuses on how provenance and lineage, typically thought of as a model deployment problem, can help make the development of machine learning models more repeatable, understandable, and robust. Discover the difference between lineage and provenance. Learn how to determine the “strength” of your lineage and how robust it is to failure.

The Value of a Machine Learning Pipeline: Past, Present, and the Future of MLOps With Kubeflow

Published: Nov 1, 2021
Author: Dr. Phil Winder
CEO

Industrial machine learning consulting projects come in a variety of forms. Sometimes clients ask for exploratory data analysis, to evaluate whether their data can be used to help solve a problem using artificial intelligence. Other times we use machine learning (ML) algorithms to automate decisions and improve efficiencies within a business or product. More recently we’ve refocused on reinforcement learning and customers ask us to help control some complex multi-step process.

Fast Time-Series Filters in Python

Published: Oct 4, 2019
Author: Dr. Phil Winder
CEO

Time-series (TS) filters are often used in digital signal processing for distributed acoustic sensing (DAS). The goal is to remove a subset of frequencies from a digitised TS signal. To filter a signal you must touch all of the data and perform a convolution. This is a slow process when you have a large amount of data. The purpose of this post is to investigate which filters are fastest in Python.

Scikit Learn to Pandas: Data types shouldn't be this hard

Published: Feb 3, 2019
Author: Dr. Phil Winder
CEO

Nearly everyone using Python for Data Science has used or is using the Pandas Data Analysis/Preprocessing library. It is as much of a mainstay as Scikit-Learn. Despite this, one continuing bugbear is the different core data types used by each: pandas.DataFrame and np.array. Wouldn’t it be great if we didn’t have to worry about converting DataFrames to numpy types and back again? Yes, it would. Step forward Scikit Pandas. Sklearn Pandas Sklearn Pandas, part of the Scikit Contrib package, adds some syntactic sugar to use Dataframes in sklearn pipelines and back again.

Principal Component Analysis

Published: Jan 28, 2018
Author

Dimensionality Reduction - Principal Component Analysis Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Sometimes data has redundant dimensions. For example, when predicting weight from height data you would expect that information about their eye colour provides no predictive power. In this simple case we can simply remove that feature from the data. With more complex data it is usual to have combinations of features that provide predictive power.

Distance Measures with Large Datasets

Published: Jan 1, 2018
Author

Distance Measures for Similarity Matching with Large Datasets Today I had an interesting question from a client that was using a distance metric for similarity matching. The problem I face is that given one vector v and a list of vectors X how do I calculate the Euclidean distance between v and each vector in X in the most efficient way possible in order to get the top matching vectors? A distance measure is the measure of how similar one observation is compared to a set of other observations.

603: Nearest Neighbour Tips and Tricks

Published: Jan 1, 2018
Author

Dimensionality and domain knowledge Is it right to use the same distance measure for all features? E.g. height and sex? CPU and Disk space? Some features will have more of an effect than others due to their scales. ??? In this version of the algorithm all features are used in the distance calculation. This treats all features the same. So a measure of height has the same effect as the measure of sex.

602: Nearest Neighbour Classification and Regression

Published: Jan 1, 2018
Author

More than just similarities Classification: Predict the same class as the nearest observations Regression: Predict the same value as the nearest observations ??? Remember for classification tasks, we want to predict a class for a new observation. What we could do is predict a class that is the same as the nearest neighbour. Simple! For regression tasks, we need to predict a value. Again, we could use the value of the nearest neighbour!

601: Similarity and Nearest Neighbours

Published: Jan 1, 2018
Author

This section introduces the idea of “similarity”. Why?: Simplicity Many business tasks require a measure of “similarity” Works well Business reasoning Why would businesses want to use a measure of similarity? What business problems map well to similarity classifiers? Find similar companies on a CRM Find similar people in an online dating app Find similar configurations of machines in a data centre Find pictures of cats that look like this cat Recommend products to buy from similar customers Find similar wines Similarity What is similarity?