Parallel Monte-Carlo tree search in distributed environments

Christoph, Marc (2020-03)

Thesis (MSc)--Stellenbosch University, 2020.

Thesis

ENGLISH ABSTRACT: Parallelising Monte-Carlo Tree Search (MCTS) holds the promise of being an effective way to improve the effectiveness of the search, given some time constraint. Thus, finding scalable parallelisation techniques has been an important area of research since MCTS was first proposed. The inherently serial nature of MCTS makes effective parallelisation difficult, since care must be taken to ensure that all threads or processes have access to accurate statistics. This is more challenging in distributed-memory environments due to the latency incurred by network communication.Prior proposals of distributed MCTS have presented their results on different domains and hardware setups, making them difficult to compare.To try to improve this state of affairs, we use the actor-based framework Akka to implement and compare various distributed MCTS algorithms on a common domain—the board game Lines of Action (LOA). We describe our implementation and evaluate the scalability of each approach in terms of play outs per second (PPS), unique nodes searched per second (NPS), and playing strength, STRACTiii. We observe that distributed root parallelisation provides the best PPS scalability, but has relatively poor scalability in terms of NPS. We contrast this with distributed tree parallelisation which scales well in terms of NPS but performs poorly in terms of PPS. Distributed leaf parallelisation is shown to scale up to 128 compute nodes in terms of PPS, but its NPS scalability is limited by its single compute node that manages the tree.We determine that distributed root parallelisation combined with tree parallelisation is the strongest of the distributed MCTS algorithms, with none of our other implementations managing a win-rate of more than 50% against the algorithm. We show that distributed root/leaf parallelisation, as well as our distributed leaf parallelisation with a multi-threaded traverser scale well in terms of playing strength. Distributed tree parallelisation viaTDS, df-UCT and UCT-Treesplit is shown to have limited playing strength scalability, and we provide possible avenues for future work that may resolve this limited performance.We hope that these findings will provide future researchers with sufficient recommendations for implementing distributed MCTS programs.

AFRIKAANSE OPSOMMING: Parallelisering van die Monte-Carlo Boomsoektog algoritme (MCTS)blyk ‘n effektiewe manier te wees om die doeltreffendheid van soektogte,onderhewig aan ’n gegewe tydsbeperking, te verbeter. Dus, is die ontwikkeling van skaaleerbare parallelisasietegnieke ‘n belangrike navorsingsarea sedert MCTS voorgestel is. Die inherente sekwensiële aard van MCTS maak effektiewe parallelisering moeilik, en tegnieke wat verseker dat alle liggewigprosesse toegang tot akkurate statistieke het, moet ondersoek word. Die deel van statistieke is meer uitdagend in verspreide geheue omgewings as gevolg van die latensie wat veroorsaak deur netwerkkommunikasie.Vorige voorstelle van verspreide MCTS algoritmes is getoets vir verskillende take en hardeware, wat dit moeilik maak om hulle resultate met mekaar te vergelyk. Dus gebruik ons die akteur-gebaseerde raamwerk Akka om verskillende verspreide MCTS-algoritmes te implementeer en op dieselfde taak—die bordspel Lines of Action (LOA)—te toets en te vergelyk.Ons beskryf die implementasie daarvan en evalueer die skaaleerbaarheid van elke benadering in terme van simulasies per sekonde (PPS), unieke nodusse per sekonde (NPS) en spelkrag. Ons bewys dat verspreide wortelparallelisering die beste UPS-skaaleerbaarheidbied, maar relatief swak skaaleerbaarheid in terme van NPS. Ons kontrasteerdit met verspreide boomparallelisering wat goed skaaleer in terme van NPS,maar wat swak presteer in terme van PPS. Daar word getoon dat verspreide blaarparallelisering tot 128 berekenings-nodusse in terme van PPS skaaleer,maar dat die NPS-skaaleerbaarheid daarvan beperk word deur die gebruik van ‘n enkele nodus wat die boom bestuur. Ons bepaal dat verspreide wortelparallelisering gekombineer met boom-parallelisering die sterkste is van die verspreide MCTS-algoritmes, en geen van ons ander implementerings het ‘n wenkoers van meer as 50 % teen hierdie algoritme nie. Ons toon aan dat verspreide wortel/blaarparallelisering,sowel as ons verspreide blaarparallelisering met ‘n multi-liggewigproses deurstapalgoritme, goed skaleer ten opsigte van spelkrag. Daar word getoon dat verspreide boomparallalisering swak spelkrag-skaaleerbaarheidtoon, en ons bied idees vir toekomstige werk wat hierdie swak prestasiemoontlik kan oplos.Ons hoop dat hierdie bevindings sal toekomstige navorsers voldoende aanbevelings sal gee vir die implementering van verspreide MCTS-programme.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/108014
This item appears in the following collections: