An Introduction to Reinforcement Learning: All You Need to Know

When a child wants to ride a bike, they learn by doing. This process of trial and error is also known as learning through reinforcement, because positive and negative experiences promote or discourage certain behaviours, respectively. Children learn to ride by avoiding actions that cause them to crash; for the thrill of feeling wind in their hair.

Reinforcement Learning In Business

The application of this style of learning by trial and error is called reinforcement learning. An automated agent, codified in software, is unleashed upon the world in an attempt to learn desirable behaviours. The definition of desirable is controlled by the reward, or punishment, given to the agent. Engineers develop rewards that are aligned with the problem definition.

The Difference Between Reinforcement Learning and Machine Learning

Machine learning has three discerning features. It uses an entire data set to learn a model. It is (often) supervised, which means that the model knows what it is supposed to produce. And it makes a single decision.

Reinforcement learning is the polar opposite. It samples experiences. It evaluates its performance via indirect feedback. And it makes multiple decisions over time to maximise the reward.

Thinking back to the bike analogy, you can see why machine learning is not a solution to this problem. The child doesn’t possess an entire dataset as they’ve never ridden a bike. They don’t have direct supervision because their parent can’t micro-manage every movement. And they can’t ride a bike by making one single decision, it’s a reactive process.

This means that certain business problems may be better solved by reinforcement learning.

How to Find Reinforcement Learning Problems in Business

Reinforcement learning works best when an objective is achieved only after multiple, sequential decisions.

One test I often apply is to imagine how I, as a human, would solve the problem at hand. If I can imagine a single rule, like accepting customers that have a credit score of above 400 or discerning a number plate on a car, then this task might be best left to machine learning.

But if I need to develop a strategy for reacting to complex scenarios, or if I need to try something first to find out what the reaction is, this is perfectly suited to reinforcement learning.

Some common strategic examples of reinforcement learning are playing games, targeting customers, controlling industrial processes, and automating experiments. I maintain a more comprehensive list of the industrial applications of reinforcement learning on my book’s website.

Common Challenges in Reinforcement Learning

Given a clear problem definition, ensuring that reinforcement learning is a good fit is an early but important milestone.

Recall that agents need to experiment, which means they need to interact with the environment, where the environment is the context within which the agent resides. In many scenarios, it is expensive, or downright dangerous to allow an agent to explore to its heart’s (clock’s?) content.

Yes, you can quite easily enforce safety through constraints and boundaries, but ideally, you want to allow the agent to explore unhindered.

The common solution to this problem is to leverage a simulation of the environment. Depending on your domain, this might be as simple as using an off-the-shelf solution, like a 3D game engine. But for many projects, a simulation must be developed to gain confidence in the agent or provide pre-training.

Another common set of problems is that despite a clear problem definition it can be hard to observe the right data in the environment or define an adequate reward. The result of these is an agent that is unable to learn, because it cannot “see”, or that it learns the wrong thing because your definition of success isn’t sufficient to solve the problem.

How to Start a Reinforcement Learning Project

RL projects, like ML projects, are inherently risky.

All AI consulting projects are dependent on external assets like data or access to domain experts. AI projects also depend on solving the right problem, which is suprisingly difficult to nail down. So even with the best experts in the world, it is still reasonably likely that the project will fail, or at least fail to meet high expectations. One of the easiest ways of reducing this risk is by performing a reinforcement learning POC.

Proof of concept (POC) projects reduce the risk of investing large amounts of money into projects that have no hope of success. POCs aim to isolate the riskiest parts of the project and prove that the solution is viable, even with these risks.

It is often possible to derisk a project by actively researching or developing parts of the project that are deemed a risk. If this small sub-project is a success, then this gives us confidence that the whole project will also be a success.

The Future of Reinforcement Learning in Business

As a company specializing in reinforcement learning consulting, we come across many different organizations attempting to develop competitive advantages using new technology. The reason why RL is especially interesting is that it enables the automation of tasks that were either impossible or incredibly different to automate before. One example that I like to talk about is a project at YouTube, owned and operated by some of the brightest minds at Google, where engineers swapped out the YouTube search algorithm and replaced it with one based upon RL. The RL solution learned to improve its recommendations from click-through feedback from users and very quickly matched the performance of the current implementation. After a little more time it surpassed it.

Bear in mind that the recommendation algorithms at Google are, almost by definition, the best in the world. This is Google’s core technology. But an RL implementation, with very little supervision, rapidly superseded the performance of an algorithm developed over many years by some very smart data scientists.

Examples like this lead me to the conclusion that it is only a matter of time before all problems that have sequential decisions and abstract measures of success are solved with RL.

I’m also confident that RL as a technology will move upwards through the organizational hierarchy. Businesses are quite used to automating processes with software and are coming to grips with automating decisions using machine learning. But now we can begin automating strategies, those hallowed by executives, to fulfil the destiny of improving efficiency throughout the entire business.

Competitive Differentiation

Either way, capitalism demands the continuous optimisation of a business and therefore automation is here to stay. Incumbents are taking advantage of new technologies like reinforcement learning to achieve feats that were previously impossible.

Take FreshFlow as an example. We’ve been guiding them after the founders took inspiration from an example in my reinforcement learning book, where I create an agent that is capable of automatically learning what you need from a shop and when.

I imagined this because I hate food shopping, mainly because 90% of my shopping is the same. But because I use food at different rates, I, unfortunately, can’t just buy the same shopping over and over again. The example I described took a person’s shopping history and slowly started sending products automatically. The customer had the option to reject or return items. This feedback taught the RL agent what should be sent and when. This idea is poised to massively change the way we order goods.

I’d go so far as suggesting that in 5 years, this will become the norm, especially for routine and regular purchases. Enjoy your food shopping while you still can!