The Future of Transportation Infrastructure: Reinforcement Learning

An investigation in to the proven uses of reinforcement learning for transportation and associated infrastructure.

REINFORCEMENT LEARNING
USE CASE

The lock-downs endured during the coronavirus pandemic have given many the opportunity to work from home, potentially for the first time. Along with the guilt of failing at home-schooling, trying to work with noisy babies or animals, the lock-down has entirely changed the way in which we travel.

When I speak to people about the pandemic, the lack of commute is one of the few positives they can take away from this experience and has led some to even question why they are paying for accommodation in some of the most expensive areas in the UK.

This break from normality represents the perfect opportunity to consider what new tools can help build and improve the future of transportation infrastructure. The Allan Turing Institute were kind enough to ask me what I thought about this, which resulted in the publishing of a joint article with Proffessor John Moriarty.

My machine learning consultancy, Winder.AI, is particularly invested in a technique called reinforcement learning, which has the potential to impact and disrupt industries like machine learning did a few years ago.

Reinforcement Learning for Transportation

Reinforcement learning (RL) is a machine learning paradigm to solve problems that require sequential decisions. It works by learning a strategy, over time, through trial-and-error. This framework sounds simple, but highly complex and often surprising behaviour can emerge. You can learn more about the details of reinforcement learning in my new book.

Transportation, in particular, has a wide variety of such challenges. Take autonomous vehicles, now a field in its own right, as an example. The entire premise of moving something or someone from one point to another is one giant sequential decision-making problem.

But machine learning discovers models from data. RL is no different and in fact it often needs more data, because it must explore to find optimal solutions, which of course means that in all cases bar one, behaviour is sub-optimal and sometimes catastrophic. There are a wide range of technical solutions to promote safe RL, but most researchers use simulations, at least initially. Even after proving in the real world, it is useful to transfer new data and learning back to the simulation.

But what are the most common problems being tackled in transportation?

Autonomous Vehicles

Autonomous driving has a variety of problems that are suitable for RL. Common tasks involve sensing the surroundings, utilizing information from other vehicles, decision-making, strategic planning, and task execution.

A significant amount of work has focussed on the direct control of vehicles, often in constrained settings such as on a motorway or for constrained tasks, like automatic lane changing.

Some of the most intriguing research investigates aspects other than direct control. One that caught my eye is where the authors used inverse RL to learn the reward function of an average human driver, rather than attempting to specify a good reward function up front, which is quite hard to do.

Electric Vehicles and Routing

Vehicle routing is a common RL problem, but electric vehicles (EVs) pose additional challenges because they need frequent recharging and the number of recharging points are limited. Researchers have looked at combining graphs and RL to redistribute EVs for commercial fleets like those used in logistics. Similar problems exist for short-term car hire companies as well.

Traffic Signal Control

All vehicular forms of transport are limited by signalled junctions. The idea is that the control optimizes vehicle flow, but most currently operate on static logic and overly-simplistic assumptions about the traffic distribution. It has been repeatedly shown that RL vastly outperforms traditional traffic signalling techniques.

Modern highway infrastructure, such as “smart motorways” in the UK, include variable speed limits. These could be specified by an RL agent to optimize for overall traffic flow and be much more successful than the manual schemes that are used today.

Near where I live, there is a section of the M1 motorway where it merges with the A1(M) and then 1 kilometre later diverges into another two roads. This is a nightmare to navigate, even for seasoned drivers, because cars are merging from all directions. In the future, when it is possible to signal required speeds to individual cars or drivers, RL opens the possibility to automate low to no-risk merging of vehicles joining from slip roads. Even today this is possible where slip roads have some limited form of control, like traffic lights.

Planes, Trains and … Ships

It’s not only road-going transport that can be improved by RL. Overland and underground train scheduling is an obvious sequential decision-making challenge, where rescheduling, re-routing, or cancelling trains can help alleviate track congestion whilst minimizing passenger delays. Automatic train operations and train shunting operations are also scheduling problems, with the added complexity of specifying train speeds or locations. The Dutch rail operator NS is one user that leverages RL for shunting operations.

Therefore, it may not come as a surprise to find that air traffic control is another application area. The main reason for this is that it is becoming increasingly difficult to manually operate and control such a volume of flying objects. With increasing numbers of low-altitude drone or vertical take off and landing vehicles, a more sophisticated and automated solution, like RL, is increasingly necessary.

However, you may not expect that another popular topic for RL is the automated control of ships. Shipping represents the confluence of two core RL applications, scheduling and control. Since shipping as a commercial activity exists because of logistical requirements, optimizing the scheduling and routing of goods to get to a destination is of obvious appeal. More recently autonomous ships are being considered to improve safety, reduce costs, reduce environmental impact, and piracy. The control and navigation of these ships can be delegated to RL agents because of their ability to handle complex and dynamic systems.

Transport Equality and Final Thoughts

Whilst researching this article I read a lot about controlling or directing transportation to minimize time or maximize flow. I feel like this type of application, although technically challenging due to safety concerns and lack of data, is fairly easy to comprehend. But I want to highlight one RL paper I came across that really hit me; it really stood out by taking a completely different look at how transportation infrastructure is prioritized.

Education and health outcomes are linked to the accessibility of high-quality institutions like schools, hospitals, libraries, or even markets. In urban transportation networks, different racial and socioeconomic groups have varying access to these facilities.

Governmental organizations can step in to improve facilities in deprived areas, or to provide better transportation links. But these can be expensive (HS2?), take a long time (Berlin Brandenburg Airport?), or be unsuitable in the first place (Concorde?). From an implementation perspective, it could be as simple as changing a bus route. So why not use RL to optimize bus routes to improve social equality!

Clearly this is another scheduling/routing problem, which makes it ideal for RL, but that’s not the point. I love how RL is (potentially) improving the lives of people that otherwise don’t have the power to control their own future.

RL, a technique that optimizes sequential decision-making, is at the forefront of machine learning. It has the potential to revolutionize how we travel, how we move goods, and hopefully the lives of those less fortunate. And if you’d like to know more about RL, then I can wholeheartedly recommend my book, Reinforcement Learning: Industrial Applications of Intelligent Agents.

Further Reading

Below is a collection of links that I used during my research of this article that you may also find useful: