Reinforcement learning for routing in communication networks

Andrag, Walter H. (2003-04)

Thesis (MSc)--Stellenbosch University, 2003.

Thesis

ENGLISH ABSTRACT: Routing policies for packet-switched communication networks must be able to adapt to changing traffic patterns and topologies. We study the feasibility of implementing an adaptive routing policy using the Q-Learning algorithm which learns sequences of actions from delayed rewards. The Q-Routing algorithm adapts a network's routing policy based on local information alone and converges toward an optimal solution. We demonstrate that Q-Routing is a viable alternative to other adaptive routing methods such as Bellman-Ford. We also study variations of Q-Routing designed to better explore possible routes and to take into consideration limited buffer size and optimize multiple objectives.

AFRIKAANSE OPSOMMING:Die roetering in kommunikasienetwerke moet kan aanpas by veranderings in netwerktopologie en verkeersverspreidings. Ons bestudeer die bruikbaarheid van 'n aanpasbare roeteringsalgoritme gebaseer op die "Q-Learning"-algoritme wat dit moontlik maak om 'n reeks besluite te kan neem gebaseer op vertraagde vergoedings. Die roeteringsalgoritme gebruik slegs nabygelee inligting om roeteringsbesluite te maak en konvergeer na 'n optimale oplossing. Ons demonstreer dat die roeteringsalgoritme 'n goeie alternatief vir aanpasbare roetering is, aangesien dit in baie opsigte beter vaar as die Bellman-Ford algoritme. Ons bestudeer ook variasies van die roeteringsalgoritme wat beter paaie kan ontdek, minder geheue gebruik by netwerkelemente, en wat meer as een doelfunksie kan optimeer.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/53570
This item appears in the following collections: