Visualising Underfitting and Overfitting in High Dimensional Data

Published
Author

Visualising Underfitting and Overfitting in High Dimensional Data

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

In the previous workshop we plotted the decision boundary for under and overfitting classifiers. This is great, but very often it is impossible to visualise the data, usually because there are too many dimensions in the dataset.

In thise case we need to visualise performance in another way. One way to do this is to produce a validation curve. This is a brute force approach that repeatedly scores the performanc of a model on holdout data for each parameter that you specify.

Read more

Nearest Neighbour Algorithms

Published
Author

Nearest Neighbour Algorithms

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

Nearest neighbour algorithms are a class of algorithms that use some measure of similarity. They rely on the premise that observations which are close to each other (when comparing all of the features) are similar to each other.

Making this assumption, we can do some interesting things like:

  • Recommendations
  • Find similar stuff

But more crucially, they provide an insight into the character of the data.

Read more

K-NN For Classification

Published
Author

K-NN For Classification

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

In a previous workshop we investigated how the nearest neighbour algorithm uses the concept of distance as a similarity measure.

We can also use this concept of similarity as a classification metric. I.e. new observations will be classified the same as its neighbours.

This is accomplished by finding the most similar observations and setting the predicted classification as some combination of the k-nearest neighbours. (e.g. the most common)

Read more

Overfitting and Underfitting

Published
Author

Underfitting and Overfitting

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

Imagine you had developed a model that predicts some output. The goal of any model is to generate a correct prediction and avoid incorrect predictions. But how can we be sure that predictions are as good as they can possibly be?

Now constrain your imagining to a classification task (other tasks have similar properties but I find classification easiest to reason about). We use some data to train the model. The result of the training process will be a decision boundary. I.e. class A on one side, class B on another.

Read more

Support Vector Machines

Published
Author

Support Vector Machines

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

If you remember from the video training, SVMs are classifiers that attemt to maximise the separation between classes, no matter what the distribution of the data. This means that they can sometimes fit noise more than they fit the data.

But because they are aiming to separate classes, they do a really good job at optimising for accuracy. Let’s investigate this below.

Read more

Logistic Regression

Published
Author

Logistic Regression

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

I find the name logistic regression annoying. We don’t normally use logistic regression for anything other than classification; but statistics coined the name long ago.

Despite the name, logistic regression is incredibly useful. Instead of optimising the error of the distance like we did in standard linear regression, we can frame the problem probabilistically. Logistic regression attempts to separate classes based upon the probability that an observation belongs to a class.

Read more

Linear Classification

Published
Author

Linear Classification

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

We learnt that we can use a linear model (and possibly gradient descent) to fit a straight line to some data. To do this we minimised the mean-squared-error (often known as the optimisation/loss/cost function) between our prediction and the data.

It’s also possible to slightly change the optimisation function to fit the line to separate classes. This is called linear classification.

Read more

Regression: Dealing With Outliers

Published
Author

Regression: Dealing with Outliers

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

Outliers are observations that are spurious. You can usually spot outliers visually; they are often far away from the rest of the observations.

Sometimes they are caused by a measurement error, sometimes noise and occasionally they can be observations of interest (e.g. fraud detection).

But outliers skew the estimates of the mean and standard deviation and therefore affect linear models that use error measures that assume normality (e.g. Minimum Squared Error).

Read more

Linear Regression

Published
Author

Linear Regression

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

Regression is a traditional task from statistics that attempts to fit model to some input data to predict the numerical value of an output. The data is assumed to be continuous.

The goal is to be able to take a new observation and predict the output with minmal error. Some examples might be “what will next quater’s profits be?” and “how many widgets do we need to stock in order to fulfil demand?”.

Read more

Introduction to Gradient Descent

Published
Author

Introduction to Gradient Descent

Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos.

For only a few algorithms an analytical solution exists. For example, we can use the Normal Equation to solve a linear regression problem directly.

However, for most algorithms we rely cannot solve the problem analytically; usually because it’s impossible to solve the equation. So instead we have to try something else.

Read more
}