Machine Learning Presentation: Packaging Your Models

Published
Author
Dr. Phil Winder
CEO

Dr. Phil Winder shares experiences of Winder.AI’s machine learning consulting experience at a variety of large and small organizations. Abstract In this talk he focuses on packaging ML models for production serving. Learn about how the cloud vendors compare, what orchestration abstractions prefer, and how packaging tools seek to find the right abstractions. At the end of the talk Phil distils this information and presets best practices. There’s also some discussion of future trends and some ideas for aspiring open-source engineers.

Read more

Machine Learning Presentation: Provenance and Lineage for Data, Pipelines, and Deployments

Published
Author
Dr. Phil Winder
CEO

Dr. Phil Winder shares experiences of Winder.AI’s machine learning consulting experience at a variety of large and small organizations. Abstract In this talk he focuses on how provenance and lineage, typically thought of as a model deployment problem, can help make the development of machine learning models more repeatable, understandable, and robust. Discover the difference between lineage and provenance. Learn how to determine the “strength” of your lineage and how robust it is to failure.

Read more

The Value of a Machine Learning Pipeline: Past, Present, and the Future of MLOps With Kubeflow

Published
Author
Dr. Phil Winder
CEO

Industrial machine learning consulting projects come in a variety of forms. Sometimes clients ask for exploratory data analysis, to evaluate whether their data can be used to help solve a problem using artificial intelligence. Other times we use machine learning (ML) algorithms to automate decisions and improve efficiencies within a business or product. More recently we’ve refocused on reinforcement learning and customers ask us to help control some complex multi-step process.

Read more

Fast Time-Series Filters in Python

Published
Author
Dr. Phil Winder
CEO

Time-series (TS) filters are often used in digital signal processing for distributed acoustic sensing (DAS). The goal is to remove a subset of frequencies from a digitised TS signal. To filter a signal you must touch all of the data and perform a convolution. This is a slow process when you have a large amount of data. The purpose of this post is to investigate which filters are fastest in Python.

Read more

Scikit Learn to Pandas: Data types shouldn't be this hard

Published
Author
Dr. Phil Winder
CEO

Nearly everyone using Python for Data Science has used or is using the Pandas Data Analysis/Preprocessing library. It is as much of a mainstay as Scikit-Learn. Despite this, one continuing bugbear is the different core data types used by each: pandas.DataFrame and np.array. Wouldn’t it be great if we didn’t have to worry about converting DataFrames to numpy types and back again? Yes, it would. Step forward Scikit Pandas. Sklearn Pandas Sklearn Pandas, part of the Scikit Contrib package, adds some syntactic sugar to use Dataframes in sklearn pipelines and back again.

Read more

Principal Component Analysis

Published
Author

Dimensionality Reduction - Principal Component Analysis Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Sometimes data has redundant dimensions. For example, when predicting weight from height data you would expect that information about their eye colour provides no predictive power. In this simple case we can simply remove that feature from the data. With more complex data it is usual to have combinations of features that provide predictive power.

Read more

Distance Measures with Large Datasets

Published
Author

Distance Measures for Similarity Matching with Large Datasets Today I had an interesting question from a client that was using a distance metric for similarity matching. The problem I face is that given one vector v and a list of vectors X how do I calculate the Euclidean distance between v and each vector in X in the most efficient way possible in order to get the top matching vectors? A distance measure is the measure of how similar one observation is compared to a set of other observations.

Read more

603: Nearest Neighbour Tips and Tricks

Published
Author

Dimensionality and domain knowledge Is it right to use the same distance measure for all features? E.g. height and sex? CPU and Disk space? Some features will have more of an effect than others due to their scales. ??? In this version of the algorithm all features are used in the distance calculation. This treats all features the same. So a measure of height has the same effect as the measure of sex.

Read more

602: Nearest Neighbour Classification and Regression

Published
Author

More than just similarities Classification: Predict the same class as the nearest observations Regression: Predict the same value as the nearest observations ??? Remember for classification tasks, we want to predict a class for a new observation. What we could do is predict a class that is the same as the nearest neighbour. Simple! For regression tasks, we need to predict a value. Again, we could use the value of the nearest neighbour!

Read more

601: Similarity and Nearest Neighbours

Published
Author

This section introduces the idea of “similarity”. Why?: Simplicity Many business tasks require a measure of “similarity” Works well Business reasoning Why would businesses want to use a measure of similarity? What business problems map well to similarity classifiers? Find similar companies on a CRM Find similar people in an online dating app Find similar configurations of machines in a data centre Find pictures of cats that look like this cat Recommend products to buy from similar customers Find similar wines Similarity What is similarity?

Read more
}