In this article we present MADRaS: Multi-Agent DRiving Simulator. It is a multi-agent version of TORCS, a racing simulator popularly used for autonomous driving research by the reinforcement learning and imitation learning communities. See:
- A human-like TORCS controller for the Simulated Car Racing Championship
- Learning to overtake in TORCS using simple reinforcement learning
- Learning drivers for TORCS through imitation using supervised methods
- Asynchronous Methods for Deep Reinforcement Learning
- Continuous Control With Deep Reinforcement Learning
MADRaS is a multi-agent extension of Gym-TORCS and is open source, lightweight, easy to install, and has the OpenAI Gym API, which makes it ideal for beginners in autonomous driving research. It enables independent control of tens of agents within the same environment, opening up a prolific direction of research in multi-agent reinforcement learning and imitation learning research aimed at acquiring human-like negotiation skills in complicated traffic situations—a major challenge in autonomous driving that all major players are racing to solve.
Most open-source autonomous driving simulators (like CARLA*, DeepDrive, AirSim, and Udacity* SDC) innately support only egocentric control; that is, single agent behavior, and have preprogrammed behaviors for the other agents. The difficulty in introducing agents with custom behaviors in these simulators restricts the diversity of real-world scenarios that can be simulated. To address this issue, we developed MADRaS, wherein each car on the racing track can be independently controlled, enabling the creation of rich, custom-made traffic scenarios, and learning the policy of control of multiple agents simultaneously.
The task of negotiation in traffic can be posed as that of finding the winning strategy in a multi-agent game, wherein multiple entities (cars, buses, two-wheelers, and pedestrians) are trying to achieve their objectives of getting from one place to another fast, yet safely and reliably. Imitation learning algorithms like Behavioral Cloning, Active Learning, and Apprenticeship Learning (Inverse Reinforcement Learning followed by Reinforcement Learning) have proved to be effective for learning such sophisticated behaviors, under a multitude of simplifying assumptions and constraining conditions. A major portion of the contemporary literature makes the single-agent assumption; that is, the agent acts in an environment with a plethora of other agents—similar or different—but does not interact with any of them, robbing it of data and information that could potentially be extremely useful in decision making, at both the egocentric and collaborative levels.
Driving, however, is inherently multi-agent, and the following is a partial list of things that become possible once we get rid of the single-agent assumption.
One of the earliest instances of multi-agent systems being deployed in vehicles (starting way back in 1993!) was in the use of platooning, wherein vehicles travel at highway speeds with small inter-vehicle spacing to reduce congestion and still achieve high throughput without compromising safety. Now it seems obvious that autonomous cars in the near future will communicate, cooperate, and form platoons over intersecting lengths of their commutes.
Source : eDriving*
Source : phys.org
Apart from transferring information about pile-ups and possible diversions ahead to all the vehicles in the geographical vicinity, this power of reliable communication can be used to pool together the knowledge of multiple learning agents. An intuitive motivation could be to consider a large gridworld. With a single learning agent, one could solve the gridworld in n hours of training. With multiple learning agents pooling their experiences, we could cut down the training time significantly, possibly even linearly!
There’s a host of untapped literature on communication among multiple agents in various environments (not autonomous driving... yet.) See:
- Emergence of Grounded Compositional Language in Multi-Agent Populations
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- Modeling Others using Oneself in Multi-Agent Reinforcement Learning
- Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Now this raises important questions about the reliability of the communication between vehicles. With the imminent advent of 5G,1 fast and reliable communication between vehicles can help lead to the training and deployment of completely hands-free autonomous cars.
Drivers on the road constantly anticipate the potential actions of fellow drivers. As an example, for close maneuvering in car parks and intersections, eye contact is made to ensure a shared understanding. Defense Advanced Research Projects Agency (DARPA) stated that traffic vehicle drivers, unnerved by being unable to make eye contact with the robots, had resorted to watching the front wheels of the robots for an indication of their intent.2
Source : The Star
Multi-agent learning comes with its own share of complications:
- Nonstationarity A moving-target problem, since the best policy changes as the other agents’ policies change.
- Curse of dimensionality The exponential growth of state and action variables with the number of agents.
- Specifying a good goal Difficult, since the agents’ returns are correlated and cannot be maximized independently.
- Exploration Apart from having to explore the environment, the agents also have to obtain information about other agents.
- Coordination The effect of an agent’s action on the environment also depends on the actions taken by other agents, hence the need of mutually consistent actions in order to achieve the intended effect.
But remember why we started solving fully autonomous driving (FAD) in the first place. Writing for Technology Review, Will Knight outlines the possibilities of our driverless car future:
- Safer transportation
The National Highway Traffic Safety Administration estimates that more than 90 percent of road crashes involve human error, a figure that has led some experts to predict that autonomous driving will reduce the number of accidents on the road by a similar percentage. Assuming the technology becomes ubiquitous and does have such an effect, the benefits to society will be huge. Almost 33,000 people die on the roads in the United States each year, at a cost of USD 300 billion, according to the American Automobile Association. The World Health Organization estimates that, worldwide, over 1.2 million people die on roads every year.3
- Improved fuel efficiency
Apart from lesser traffic congestion due to fewer accidents, it is also expected that the rise of self-driving taxis will help decrease the total number of cars on the road, alleviating the overall traffic. And because driverless vehicles will be designed to optimize efficiency in acceleration and braking, the adoption of autonomous cars could reduce CO2 emissions produced by cars by as much as 300 million tons per year. Autonomous vehicles traveling in high-speed platoons that reduce aerodynamic drag could also reduce fuel consumption by 20 percent.4
- Less traffic?
Driverless cars communicating with each other and their surroundings can find and exploit the optimal routes more effectively, which will help spread the demand for scarce open road spaces.
- Improved human productivity
With cars doing most or all of the driving, we’ll be free to make the most of our time spent in the vehicle—spending that time reading books, watching a game, interacting with family members, and even getting some work done!
The list goes on...
So, today we’re excited to release MADRaS for the community to kickstart research into making FAD a reality. With the ability of introducing multiple learning agents in the environment at the same time, this simulator, built on top of TORCS, can be used to benchmark and try out existing and new multi-agent learning algorithms for self-driving cars such as: Multi-Agent Deep Deterministic Policy Gradient (MADDPG), PSMADDPG, and the lot. And since this extends TORCS, it supports the deployment of all the single-agent learning algorithms as well. Scripts for training a DDPG agent are provided as a sample.
Check out the following video for an overview of the features and the general interface.
- Check out the GitHub* repository and the wiki for the project.
- Check out videos of a sample Deep Deterministic Policy Gradients (DDPG) agent that has learned to drive in traffic.
This project was developed by Abhishek Naik and Anirban Santara (an Intel® Student Ambassador for AI) during their internship at the Parallel Computing Lab, Intel Labs, Bangalore, India. This project was driven by Intel’s urge to address the absence of an open source multi-agent autonomous driving simulator that can be utilized by machine learning (particularly, reinforcement learning) scientists to rapidly prototype and evaluate their ideas. Although the system was developed and optimized entirely on the Intel® Core™ i7 processor and Intel® Xeon® processors, we believe that it would run smoothly on other x86 platforms, too. Currently, we are working on integrating MADRaS with the Intel® Nervana™ platform Reinforcement Learning Coach and we invite the community to participate in its development.
Please feel free to report any incompatibility or bug by creating an issue in the GitHub repository. We hope MADRaS enables new and veteran researchers in academia and the industry to make this FAD a reality!
Abhishek Naik: IIT Madras
Anirban Santara: Intel Student Ambassador for AI, IIT Kharagpur
Balaraman Ravindran: Head, RBC-DSAI, IIT Madras
Bharat Kaul: Parallel Computing Lab, Intel Labs