# Machine Learning - Winder.AI Blog

Industrial insight and articles from Winder.AI, focusing on the topic Machine Learning

## 601: Similarity and Nearest Neighbours

Mon Jan 1, 2018, in Training, Machine Learning

This section introduces the idea of “similarity”. Why?: Simplicity Many business tasks require a measure of “similarity” Works well Business reasoning Why would businesses want to use a measure of similarity? What business problems map well to similarity classifiers? Find similar companies on a CRM Find similar people in an online dating app Find similar configurations of machines in a data centre Find pictures of cats that look like this cat Recommend products to buy from similar customers Find similar wines Similarity What is similarity?

## 602: Nearest Neighbour Classification and Regression

Mon Jan 1, 2018, in Training, Machine Learning

More than just similarities Classification: Predict the same class as the nearest observations Regression: Predict the same value as the nearest observations ??? Remember for classification tasks, we want to predict a class for a new observation. What we could do is predict a class that is the same as the nearest neighbour. Simple! For regression tasks, we need to predict a value. Again, we could use the value of the nearest neighbour!

## 603: Nearest Neighbour Tips and Tricks

Mon Jan 1, 2018, in Training, Machine Learning

Dimensionality and domain knowledge Is it right to use the same distance measure for all features? E.g. height and sex? CPU and Disk space? Some features will have more of an effect than others due to their scales. ??? In this version of the algorithm all features are used in the distance calculation. This treats all features the same. So a measure of height has the same effect as the measure of sex.

## Distance Measures with Large Datasets

Mon Jan 1, 2018, in Machine Learning, Workshop

Distance Measures for Similarity Matching with Large Datasets Today I had an interesting question from a client that was using a distance metric for similarity matching. The problem I face is that given one vector v and a list of vectors X how do I calculate the Euclidean distance between v and each vector in X in the most efficient way possible in order to get the top matching vectors?

## Detrending Seasonal Data

Thu Dec 21, 2017, in Machine Learning, Workshop

Detrending Seasonal Data Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. statsmodels is a comprehensive library for time series data analysis. And it has a really neat set of functions to detrend data. So if you see that your features have any trends that are time-dependent, then give this a try. It’s essentially fitting the multiplicative model: \$y(t) = Level * Trend * Seasonality * Noise\$

## Evidence, Probabilities and Naive Bayes

Thu Dec 21, 2017, in Machine Learning, Workshop

Evidence, Probabilities and Naive Bayes Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Bayes rule is one of the most useful parts of statistics. It allows us to estimate probabilities that would otherwise be impossible. In this worksheet we look at bayes at a basic level, then try a naive classifier. Bayes Rule For more intuition about Bayes Rule, make sure you check out the training.

## Hierarchical Clustering - Agglomerative

Thu Dec 21, 2017, in Machine Learning, Workshop

Hierarchical Clustering - Agglomerative Clustering Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Clustering is an unsupervised task. In other words, we don’t have any labels or targets. This is common when you receive questions like “what can we do with this data?” or “can you tell me the characteristics of this data?”. There are quite a few different ways of performing clustering, but one way is to form clusters hierarchically.

## Qualitative Model Evaluation - Visualising Performance

Thu Dec 21, 2017, in Machine Learning, Workshop

Qualitative Model Evaluation - Visualising Performance Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Being able to evaluate models numerically is really important for optimisation tasks. However, performing a visual evaluation provides two main benefits: Easier to spot mistakes Easier to explain to other people It is so easy to miss a gross error when looking at summary statistics alone. Always visualise your data/results!

## Quantitative Model Evaluation

Thu Dec 21, 2017, in Machine Learning, Workshop

Quantitative Model Evaluation Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. We need to be able to compare models for a range of tasks. The most common use case is to decide whether changes to your model improve performance. Typically we want to visualise this, and we will in another workshop, but first we need to establish some quantitative measures of performance.

## Testing Model Robustness with Jitter

Thu Dec 21, 2017, in Machine Learning, Workshop

Testing Model Robustness with Jitter Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. To test whether your models are robust to changes, one simple test is to add some noise to the test data. When we alter the magnitude of the noise, we can infer how well the model will perform with new data and different sources of noise. In this example we’re going to add some random, normally-distributed noise, but it doesn’t have to be normally distributed!