dc.contributor.advisor | Bassett, Bruce | en_ZA |
dc.contributor.advisor | Brink, Willie | en_ZA |
dc.contributor.author | Nyoni, Evander EL-Tabonah | en_ZA |
dc.date.accessioned | 2021-09-20T15:26:42Z | |
dc.date.accessioned | 2021-12-22T14:14:54Z | |
dc.date.available | 2021-09-20T15:26:42Z | |
dc.date.available | 2021-12-22T14:14:54Z | |
dc.date.issued | 2021-12 | |
dc.identifier.uri | http://hdl.handle.net/10019.1/123667 | |
dc.description | Thesis (MSc)--Stellenbosch University, 2021. | en_ZA |
dc.description.abstract | ENGLISH ABSTRACT: The majority of African languages have not fully benefited from the recent advances
in machine translation due to lack of data. Motivated by this challenge we leverage
and compare transfer learning, multilingual learning and zero-shot learning on three
Southern Bantu languages (namely isiZulu, isiXhosa and Shona) and English. We focus
primarily on the English-to-isiZulu pair, since it has the smallest number of training
pairs (30000 sentences), comprising just 28% of the average size of the other corpora.
We demonstrate the significant importance of language similarity on English-to-isiZulu
translations by comparing transfer learning and multilingual learning on the Englishto-
isiXhosa (similar) and English-to-Shona (dissimilar) tasks. We further show that multilingual
learning is the best training protocol when there is sufficient data, with BLEU
score gains of between 3.8 and 7.9 compared to transfer learning and zero-shot learning
respectively for the English-to-isiZulu task. Our findings show that zero-shot learning
is better than training a baseline model from scratch if there is not much English-toisiZulu
data. Our best model improves the previous English-to-isiZulu state-of-the-art
BLEU score by more than 10. Taken together, our findings highlight the potential of
leveraging the inter-relations within and between South Eastern Bantu languages to improve
translations in low-resource settings. | en_ZA |
dc.description.abstract | AFRIKAANSE OPSOMMING: Die meeste Afrikatale het weens die gebrek aan data nie ten volle gebaat by die onlangse
vooruitgang in masjienvertaling nie. Gemotiveer deur hierdie uitdaging benut en vergelyk
ons oordragleer, veeltalige leer en nul-skoot leer op drie Suidelike Bantoe-tale
(naamlik isiZulu, isiXhosa en Shona) en Engels. Ons fokus hoofsaaklik op die Engelstot-
isiZulu-paar, aangesien dit die kleinste aantal opleidingspare (30000 sinne) het, wat
slegs 28% van die gemiddelde grootte van die ander korpusse beslaan. Ons demonstreer
die belangrikheid van taalgelykheid in vertalings tussen Engels en isiZulu deur
die oordragleer en veeltalige leer op die take Engels-na-isiXhosa (soortgelyk) en Engelsna-
Shona (verskillende) te vergelyk. Ons toon verder dat meertalige leer die beste opleidingsprotokol
is as daar voldoende data is, met BLEU-tellingwinste van tussen 3.8 en
7.9 in vergelyking met onderskeidelik oordragleer en nul-skoot leer vir die Engels-naisiZulu-
taak. Ons bevindinge toon dat zero-shot-leer beter is as om ’n basislynmodel
van voor af op te lei as daar nie veel Engels-tot-isiZulu-data is nie. Ons beste model
verbeter ook die vorige Engels-tot-isiZulu SOTA BLEU telling met meer as 10. Ons
bevindings beklemtoon die potensiaal om die onderlinge verhoudings binne en tussen
Suid-Oosterse Bantoe-tale te benut om vertalings in lae-hulpbron-instellings te verbeter. | af_ZA |
dc.language.iso | en_ZA | en_ZA |
dc.subject | transfer learning | en_ZA |
dc.subject | multilingual learning | en_ZA |
dc.subject | zero-shot learning | en_ZA |
dc.subject | BLEU | en_ZA |
dc.title | Low-resource neural machine translation for Southern African languages | en_ZA |
dc.type | Thesis | en_ZA |