Machine Learning

Industrial insight and articles from Winder.AI, focusing on the topic Machine Learning

503: Visualising Overfitting in High Dimensional Problems

Published: Jan 1, 2018
Author

Validation curve One simple method of visualising overfitting is with a validation curve, (a.k.a fitting curve). This is a plot of a score (e.g. accuracy) verses some parameter in the model. Let’s compare the make_circles dataset again and vary the SVM->RBF->gamma value. ??? Performance of the SVM->RBF algorithm when altering the parameters of the RBF. We can see that we are underfitting at low values of \(\gamma\). So we can make the model more complex by allowing the SVM to fit smaller and smaller kernels.

502: Preventing Overfitting with Holdout

Published: Jan 1, 2018
Author

Holdout We have been using: Training data Not representative of production. We want to pretend like we are seeing new data: Hold back some data ??? When we train the model, we do so on some data. This is called training data. Up to now, we have been using the same training data to measure our accuracy. If we create a lookup table, our accuracy will be 100%. But this doesn’t generalise to new examples.

501: Over and Underfitting

Published: Jan 1, 2018
Author

Generalisation and overfitting “enough rope to hang yourself with” We can create classifiers that have a decision boundary of any shape. Very easy to overfit the data. This section is all about what overfitting is and why it is bad. ??? Speaking generally, we can create classifiers that correspond to any shape. We have so much flexibility that we could end up overfitting the data. This is where chance data, data that is noise, is considered a valid part of the model.

404: Nonlinear, Linear Classification

Published: Jan 1, 2018
Author

Nonlinear functions Sometimes data cannot be separated by a simple threshold or linear boundary. We can also use nonlinear functions as a decision boundary. ??? To represent more complex data, we can introduce nonlinearities. Before we do, bear in mind: More complex interactions between features yield solutions that overfit data; to compensate we will need more data. More complex solutions take a greater amount of computational power Anti-KISS The simplest way of adding a nonlinearities is to add various permutations of the original features.

403: Linear Classification

Published: Jan 1, 2018
Author

Classification via a model Decision trees created a one-dimensional decision boundary We could easily imagine using a linear model to define a decision boundary ??? Previously we used fixed decision boundaries to segment the data based upon how informative the segmentation would be. The decision boundary represents a one-dimensional rule that separates the data. We could easily increase the number or complexity of the parameters used to define the boundary.

402: Optimisation and Gradient Descent

Published: Jan 1, 2018
Author

Optimisation When discussing regression we found that these have closed solutions. I.e. solutions that can be solved directly. For many other algorithms there is no closed solution available. In these cases we need to use an optimisation algorithm. The goals of these algorithms is to iteratively step towards the correct result. Gradient descent Given a cost function, the gradient decent algorithm calculates the gradient of the last step and move in the direction of that gradient.

401: Linear Regression

Published: Jan 1, 2018
Author

Regression and Linear Classifiers Traditional linear regression (a.k.a. Ordinary Least Squares) is the simplest and classic form of regression. Given a linear model in the form of: \begin{align} f(\mathbf{x}) & = w_0 + w_1x_1 + w_2x_2 + \dots \\ & = \mathbf{w} ^T \cdot \mathbf{x} \end{align} Linear regression finds the parameters \(\mathbf{w}\) that minimises the mean squared error (MSE)… The MSE is the sum of the squared values between the predicted value and the actual value.

302: How to Engineer Features

Published: Jan 1, 2018
Author

Engineering features You want to do this because: Reduces the number of features without losing information Better features than the original Make data more suitable for training ??? Another part of the data wrangling challenge is to create better features from current ones. Distribution/Model specific rescaling Most models expect normally distributed data. If you can, transform the data to be normal. Infer the distribution from the histogram (and confirm by fitting distributions)

301: Data Engineering

Published: Jan 1, 2018
Author

Your job depends on your data The goal of this section is to: Talk about what data is and the context provided by your domain Discover how to massage data to produce the best results Find out how and where we can discover new data ??? If you have inadequate data you will not be able to succeed in any data science task. More generally, I want you to focus on your data.

203: Examples and Decision Trees

Published: Jan 1, 2018
Author

Example: Segmentation via Information Gain There’s a fairly famous dataset called the “mushroom dataset”. It describes whether mushrooms are edible or not, depending on an array of features. The nice thing about this dataset is that the features are all catagorical. So we can go through and segment the data for each value in a feature. This is some example data: poisonous cap-shape cap-surface cap-color bruises? p x s n t e x s y t e b s w t p x y w t e x s g f etc.