Scalable agent architecture for distributed training

Deep Reinforcement Learning (DeepRL) has achieved remarkable success in a range of tasks, from continuous control problems in robotics to playing games like Go and Atari. The improvements seen in these domains have so far been limited to individual tasks where a separate agent has been tuned and trained for each task.In our most recent work, we explore the challenge of training a single agent on many tasks.Today we are releasing DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space.Training an agent to perform well on many tasks requires massive throughput and making efficient use of every data point. To this end, we have developed a new, highly scalable agent architecture for distributed training called Importance Weighted Actor-Learner Architecture that uses a new off-policy correction algorithm called V-trace.DMLab-30DMLab-30 is a collection of new levels designed using our open source RL environment DeepMind Lab. These environments enable any DeepRL researcher to test systems on a large spectrum of interesting tasks either individually or in a multi-task setting.