Reinforcement learning in the Minecraft gaming environment

Reynard, Matthew (2020-03)

Thesis (MEng)--Stellenbosch University, 2020.

Thesis

ENGLISH ABSTRACT: With the long-term goal of surviving the night in Minecraft, we ask whether a reinforcement learning agent learns better by first learning the skills to perform smaller tasks in a complex environment or by learning the skills in the complex environment from the start. This is investigated empirically in a non-trivial game environment. We use the premise of curriculum learning where an agent learns different skills in independent and isolated sub-environments referred to as dojos. The skills learned in the dojos are then used as different actions as the agent decides which skill to perform that best applies to the current game state. We evaluate this with experiments conducted in the Minecraft gaming environment. We find that our approach of Dojo learning is able to achieve better performance with faster training time in certain environments. The main benefit of this approach is that the reward functions can be finely tuned in the dojos for each action as compared to the traditional methods. However, the skills learned in the individual dojos become the limiting factor in performance as the agent is unable to combine these skills effectively when put in certain complex environments. This can be mitigated if the dojo modules are further trained to achieve similar results as a standard deep Q network.

AFRIKAANSE OPSOMMING: Met die langtermyndoel om ’n nag in Minecraft te oorleef, vra ons of versterkingsleer beter leer deur eers die vaardighede aan te leer om kleiner take in ’n komplekse omgewing uit te voer of deur die vaardighede in die komplekse omgewing aan te leer. Dit word in ’n uitdagende spelomgewing ondersoek. Ons gebruik kurrikulumleer waar ’n agent verskillende vaardighede aanleer in onafhanklike en geïsoleerde sub-omgewings waarna as dojos verwys word. Die vaardighede wat in die dojos aangeleer word, word dan as verskillende aksies gebruik aangesien die agent besluit watter vaardighede hy moet uitvoer wat die beste van toepassing is op die huidige speltoestand. Ons evalueer dit eksperimenteel in die Minecraft-spelomgewing. Ons vind dat ons benadering van Dojo-leer beter vaar met ’n vinniger opleidingstyd in sekere omgewings. Die belangrikste voordeel van hierdie benadering is dat die beloningsfunksies in die dojo’s vir elke aksie fyn ingestel kan word in vergelyking met die tradisionele metodes. Die vaardighede wat in die individuele dojos aangeleer word, word egter die beperkende faktor aangesien die agent nie in staat is om hierdie vaardighede effektief te kombineer as dit in sekere komplekse omgewings geplaas word nie. Dit kan versag word as die dojo-modules verder afgerig word om soortgelyke resultate te lewer as ’n standaard diep Q-netwerk.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/107868
This item appears in the following collections: