Browsing by Author "Smit, Andries"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemScaling multi-agent reinforcement learning to eleven aside simulated robot soccer(Stellenbosch : Stellenbosch University, 2022-12) Smit, Andries; Engelbrecht, Herman; Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.ENGLISH ABSTRACT: Robot soccer, where teams of autonomous agents compete against each other, has long been regarded as a grand challenge in arti cial intelligence. Despite recent successes of learned policies over heuristics and handcrafted rules in other domains, current teams in the RoboCup soccer simulation leagues still rely on handcrafted strategies and apply reinforcement learning only on small subcomponents. This limits a learning agent's ability to nd strong, high-level strategies for the game in its entirety. End-to-end reinforcement learning has successfully been applied in soccer simulations with up to 4 players. However, little previous work has been done on training in settings with more than 4 players, as learning is often unstable as well as it taking much longer to learn basic strategies. In this dissertation, we investigate whether it is possible for agents to learn competent soccer strategies in a full 22 player soccer game using limited computational resources (one CPU and one GPU), from tabula rasa and entirely through self-play. To enable this investigation, we build a simpli- ed 2D soccer simulator with signi cantly faster simulation times than the o cial RoboCup simulator, that still contains the important challenges for multi-agent learning in the context of robot soccer. We propose various improvements to the standard single-agent proximal policy optimisation algorithm, in an e ort to scale it to our multi-agent setting. These improvements include (1) using a policy and critic network with an attention mechanism that scales linearly in the number of agents, (2) sharing networks between agents which allow for faster throughput using batching, and (3) using Polyak averaged opponents with freezing of the opponent team when necessary and league opponents. We show through experimental results that stable training in the full 22 player setting is possible. Agents trained in the 22 player setting learn to defeat a variety of handcrafted strategies, and also achieve a higher win rate compared to agents trained in the 4 player setting and evaluated in the full 22 player setting. We also evaluate our nal algorithm in the RoboCup simulator and observe steady improvement in the team's performance over the course of training. Our work can guide future end-to-end multi-agent reinforcement learning teams to compete against the best handcrafted strategies available in simulated robot soccer.