Winder.AI Blog

Industrial AI insight about machine learning, reinforcement learning, MLOps, and more...

Subscribe

Entropy Based Feature Selection

Thu Nov 16, 2017, in Machine Learning, Workshop

Entropy Based Feature Selection Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. One simple way to evaluate the importance of features (something we will deal with later) is to calculate the entropy for prospective splits. In this example, we will look at a real dataset called the “mushroom dataset”. It is a large collection of data about poisonous and edible mushrooms. Attribute Information: (classes: edible=e, poisonous=p) 1.

Histograms and Skewed Data

Thu Nov 16, 2017, in Machine Learning, Workshop

Histograms and Inverting Skewed Data Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. When we first receive some data, it can be in a mess. If we tried to force that data into a model it is more than likely that the results will be useless. So we need to spend a significant amount of time cleaning the data. This workshop is all about bad data.

Information and Entropy

Thu Nov 16, 2017, in Machine Learning, Workshop

Information and Entropy Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Remember the goal of data science. The goal is to make a decision based upon some data. The quality of that decision depends on our information. If we have good, clear information then we can make well informed decisions. If we have bad, messy data then our decisions will be poor.

Introduction to Python and Jupyter Notebooks

Thu Nov 16, 2017, in Machine Learning, Workshop

Introduction to Python and Jupyter Notebooks Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. This workshop is a quick introduction to using Python and Jupyter Notebooks. Python For most Data Science tasks there are two competing Open Source languages. R is favoured more by those with a mathematical background. Python is preferred by those with a programming background; all of my workshops are currently in Python.

Why Correlating Data is Bad and What to do About it

Thu Nov 16, 2017, in Machine Learning, Workshop

Correlating Data Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. Correlations between features are bad because you are effectively telling the model that this information is twice more important than everything else. You’re feeding the model the same data twice. Technically it’s known as multicollinear, which is the generalisation to any number of features that could be correlated. Generally correlating features will decrease the performance of your model, so we need to find them and remove them.

Root Cause Analysis: The 5-Whys

Sat Nov 11, 2017, in Machine Learning, Workshop

Root Cause Analysis: The 5-Whys Deciding what problem you should try and solve is one of the hardest steps to get right in Data Science. If you get it wrong, then you’ll spend significant amounts of time free wheeling around the rest of the data science process and end up with something that nobody wants or cares about. There is nothing worse that someone suggesting that your work has no value.

Probability Distributions

Sun Oct 29, 2017, in Machine Learning, Workshop

Probability Distributions Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. This workshop is about another way of presenting data. We can plot how frequent observations are to better characterise the data. Imagine you had some data. For sake of example, imagine that is a measure of peoples' height. If you measured 10 people, then you would see 10 different heights. The heights are said to be distributed along the height axis.

Mean and Standard Deviation

Fri Oct 27, 2017, in Machine Learning, Workshop

Mean and Standard Deviation Welcome! This workshop is from Winder.ai. Sign up to receive more free workshops, training and videos. This workshop is about two fundamental measures of data. I want to you start thinking about how you can best describe or summarise data. How can we best take a set of data and describe that data in as few variables as possible? These are called summary statistics because they summarise statistical data.

Why do we use Standard Deviation?

Fri Oct 27, 2017, in Machine Learning, Workshop

Why do we use Standard Deviation and is it Right? It’s a fundamental question and it has knock on effects for all algorithms used within data science. But what is interesting is that there is a history. People haven’t always used variance and standard deviation as the defacto measure of spread. But first, what is it? Standard Deviation The Standard Deviation is used throughout statistics and data science as a measure of “spread” or “dispersion” of a feature.

Research-Driven Development: Improve the Software You Love While Staying Productive

Research-Driven Development: Improve the Software You Love While Staying Productive

Mon Oct 16, 2017, in Software Engineering, Talk

Slides Abstract Have you ever wondered which parts of your job you love or hate? Chances are that like most developers you love learning and new problems to solve. You hate monotony and bureaucracy. You’ve probably put strategies in place to mitigate the things you don’t like. An anarchic development process like Agile, to reduce the amount of time in meetings. But have you ever thought about the way in which you approach learning and problem solving?