Browsing by Author "Louw, Jacobus Martin"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemSolving sparse-reward problems in partially observable 3D environments using distributed reinforcement learning(Stellenbosch : Stellenbosch University, 2021-12) Louw, Jacobus Martin; Engelbrecht, Herman; Schoeman, J. C.; Stellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.ENGLISH ABSTRACT: n this study, we address sparse-reward problems in partially observable 3D environments. The example task is set in a simulation environment where a reinforcement learning (RL) agent has to deliver a first-aid kit to an immobilised miner using an image observation. We apply a deep Q-learning algorithm with several modifications to solve this problem. We first show that it helps the agent to solve problems in the partially observable environment when the agent’s observation is augmented with a history of previous observations and performed actions. We then consider three main modifications made to the deep Q-learning algorithm to address this problem. The first is to dramatically increase the rate at which new data is generated by using a distributed system. Secondly, we utilise prioritised experience replay (PER) [39] to repeat transitions of significance more frequently to the agent. Lastly, we add the n-step return to the algorithm. The work by Hessel et al. [14] and Horgan et al. [16] shows that these modifications significantly improve the performance of the deep Q-learning algorithm on the Atari platform. The Atari platform consists mainly of simple 2D environments; however, we consider performance on a partially observable 3D environment with sparse rewards. We confirm the results of Fedus et al. [10] and show that better-performing policies are trained when the replay buffer contains more recently generated data. We show that prioritising transitions and the n-step return is very important in solving the example sparse-reward problem. In addition to these modifications we also look into strategies to improve exploration. We then demonstrate that curriculum learning (CL) or domain randomisation (DR) can be used to help the agent to solve more challenging problems where it is difficult to initially receive the reward signal. Lastly, we establish that it greatly benefits the deep Q-learning agent’s performance when CL is used in combination with DR to solve larger, more complex problems.