Masters Degrees (Electrical and Electronic Engineering)
Permanent URI for this collection
Browse
Browsing Masters Degrees (Electrical and Electronic Engineering) by Author "Bakambana, Jeremie"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemLearning decentralized policies with incremental reinforcement learning, reward shaping and self-play learning.(Stellenbosch : Stellenbosch University, 2023-03) Bakambana, Jeremie; Engelbrecht, Herman; Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.ENGLISH ABSTRACT: Humans have the interesting ability to adapt to complex tasks by leveraging knowledge acquired on simpler tasks. In addition, humans can coordinate behaviors to reach a common objective. The recent progress in the field of Reinforcement Learning (RL) has demonstrated that an agent can acclimate to a complex task after being introduced to a simpler variant of the same task. In this study, we investigate the ability of RL agents to solve a complex task, while collaborating with another learning agent. The given task is a cooperative volleyball game in 3 dimensions. We used the Proximal Policy Optimization (PPO) algorithm as the training agent because it was successful in solving, in a single-agent scenario, a simple variant of the game, which is the same volleyball game in 2 dimensions. We applied Incremental RL as a training paradigm to address the sparsity due to the large state space of the experimental environment. We first started by investigating the problem in a single-agent scenario. We broke down the main task MDP into a sequence of incremental MDPs, which generated a sequence of different variants of the same task ranging from the simplest to the most complex. Then we trained the agent to solve each task in the sequence starting with the simplest. The investigation demonstrated that: (1) the agent can adapt to an incremental sequence of MDPs; (2) Reaching the optimal level of expertise in a simple variant of a task is not a requirement to adapt to a more complex variant of the same task, the agent can still adapt in a complex task after a partial mastering of a simpler variant; (3) the optimal policy generated by the agent at the final task generalizes over all previous MDPs generated by simpler variants of the final task; (4) A successful incremental learning can be influenced by two parameters: one controlling when the training agent can transit to a more complex variant of the given task, and another controlling how complex the new variant of the task must be. Based on the experiment result in the single-agent scenario, we investigated the paradigm in cooperative multi-agent scenarios. Toward the investigation, we demonstrated that with appropriate Reward Shaping, decentralized learning can be effective to solve cooperative scenarios without necessarily tuning hyperparameters. We also showed that Incremental Learning is an effective and promising approach to address issues such as the sparsity of tasks with large state space in the multi-agent scenario. We finally proved in our work the ability of RL agents to adapt to a dynamic environment and maintain collaboration with other agents.