Training - Winder.AI Blog

Industrial insight and articles from Winder.AI, focusing on the topic Training

101: Why Data Science?

Jan 2018, in Training, Machine Learning, Data Science

What is Data Science? Software Engineering, Maths, Automation, Data A.k.a: Machine Learning, AI, Big Data, etc. It’s current rise in popularity is due to more data and more computing power. For more information: https://winderresearch.com/what-is-data-science/ Examples US Supermarket Giants Target: Optimising Marketing using customer spending data. Walmart: Predicting demand ahead of a natural disaster. Discovery Most projects are “Discovery Projects”.

102: How to do a Data Science Project

Jan 2018, in Training, Machine Learning, Data Science

Problems in Data Science Understanding the problem “the five-whys” Different questions dramatically effect the tools and techniques used to solve the problem. Data Science as a Process More Science than Engineering High risk High reward Difficult Unpredictable By Kenneth Jensen CC BY-SA 3.0, via Wikimedia Commons Impacts of Data Science What is the purpose of the project?

201: Basics and Terminology

Jan 2018, in Training, Machine Learning, Data Science

The ultimate goal First lets discuss what the goal is. What is the goal? The goal is to make a decision or a prediction Based upon what? Information How can we improve the quality of the decision or prediction? The quality of the solution is defined by the certainty represented by the information. Think about this for a moment. It’s a key insight. Think about your projects.

202: Segmentation For Classification

Jan 2018, in Training, Machine Learning

Segmentation So let’s walk through a very visual, intuitive example to help describe what all data science algorithms are trying to do. This will seem quite complicated if you’ve never done anything like this before. That’s ok! I want to do this to show you that all algorithms that you’ve every heard of have some very basic assumption of what they are trying to do. At the end of this, we will have completely derived one very important type of classifier.

203: Examples and Decision Trees

Jan 2018, in Training, Machine Learning

Example: Segmentation via Information Gain There’s a fairly famous dataset called the “mushroom dataset”. It describes whether mushrooms are edible or not, depending on an array of features. The nice thing about this dataset is that the features are all catagorical. So we can go through and segment the data for each value in a feature. This is some example data: poisonous cap-shape cap-surface cap-color bruises? p x s n t e x s y t e b s w t p x y w t e x s g f etc.

301: Data Engineering

Jan 2018, in Training, Machine Learning

Your job depends on your data The goal of this section is to: Talk about what data is and the context provided by your domain Discover how to massage data to produce the best results Find out how and where we can discover new data ??? If you have inadequate data you will not be able to succeed in any data science task. More generally, I want you to focus on your data.

302: How to Engineer Features

Jan 2018, in Training, Machine Learning

Engineering features You want to do this because: Reduces the number of features without losing information Better features than the original Make data more suitable for training ??? Another part of the data wrangling challenge is to create better features from current ones. Distribution/Model specific rescaling Most models expect normally distributed data. If you can, transform the data to be normal. Infer the distribution from the histogram (and confirm by fitting distributions)

401: Linear Regression

Jan 2018, in Training, Machine Learning

Regression and Linear Classifiers Traditional linear regression (a.k.a. Ordinary Least Squares) is the simplest and classic form of regression. Given a linear model in the form of: \begin{align} f(\mathbf{x}) & = w_0 + w_1x_1 + w_2x_2 + \dots \\ & = \mathbf{w} ^T \cdot \mathbf{x} \end{align} Linear regression finds the parameters \(\mathbf{w}\) that minimises the mean squared error (MSE)… The MSE is the sum of the squared values between the predicted value and the actual value.

402: Optimisation and Gradient Descent

Jan 2018, in Training, Machine Learning

Optimisation When discussing regression we found that these have closed solutions. I.e. solutions that can be solved directly. For many other algorithms there is no closed solution available. In these cases we need to use an optimisation algorithm. The goals of these algorithms is to iteratively step towards the correct result. Gradient descent Given a cost function, the gradient decent algorithm calculates the gradient of the last step and move in the direction of that gradient.

403: Linear Classification

Jan 2018, in Training, Machine Learning

Classification via a model Decision trees created a one-dimensional decision boundary We could easily imagine using a linear model to define a decision boundary ??? Previously we used fixed decision boundaries to segment the data based upon how informative the segmentation would be. The decision boundary represents a one-dimensional rule that separates the data. We could easily increase the number or complexity of the parameters used to define the boundary.