More than just similarities


Remember for classification tasks, we want to predict a class for a new observation.

What we could do is predict a class that is the same as the nearest neighbour. Simple!

For regression tasks, we need to predict a value. Again, we could use the value of the nearest neighbour! Simple again!

Nearest neighbour classification

Nearest neighbour classification

Nearest neighbour regression

Nearest neighbour regression


We’ve used the nearest neighbour in these examples to demonstrate the idea.

But in reality we don’t want to use a single neighbour. It could be noise.

Instead, we can use a weighting of some number of neighbours. For example in classification we would take predict the class which had the majority of nearest neighbours. For regression we might take the average value.

Equally, we could estimate the class probability by looking at the proportion of the nearest neighbours belonging to the predicted class.

(Beware of class probabilities on small numbers of observations!)

Generally, when we’re using more than one neighbour the name of the algorithm is shortened to k-NN.

Where k refers to the number of neighbours used in the algorithm.

In general, higher values of k perform more averaging and give “smoother” results.

Let’s take another look at the “iris” dataset with various values of k.


Bias and variance part duex

What we’re seeing here is overfitting and over-generalisation in practice.

By choosing low values of k we are highly sensitive to outliers. We overfit.

But choosing high values of k, the model might not be complex enough to represent the data.

How do we pick a value of k?

We saw before how to use validation to pick make sure we are not over or underfitting.

We can do the same here, by varying the value of k and validating the result.

But be careful of the choice of K. It should be…

E.g. two classes, if we used a value for K of 6, we could have ties.

To give each class opportunity to have it’s say.


The pros of using k-NN are obvious and were stated at the beginning.

However, there are issues. Of which we discuss next.



“your mortgage application was declined because of your similarity to three people that defaulted, whom were also Danish”


First, and this doesn’t necessarily apply only to k-NN, how do you explain and justify the result?

Netflix justifies their k-NN recommendation by saying “you might like ‘I’m Alan Partridge’, because you liked both ‘Brass Eye’ and ‘Borat’”. And you’re probably ok with this justification.

But if you were declined a mortgage because “your application has been declined because your circumstances (where you live) were similar to those who have defaulted”. This might not be as easy to chew.

Furthermore, it is harder to learn something from the model, because we’re not actually modelling anything. We’ve basically created a sophisticated look up table. Stakeholders may not like this result.