Neural machine translation for Arabic dialects

dc.contributor.advisorBrink, Willieen_ZA
dc.contributor.authorSalim, Aya Hashim Tahaen_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Dept. of Applied Mathematics.en_ZA
dc.date.accessioned2025-04-08T09:19:04Z
dc.date.available2025-04-08T09:19:04Z
dc.date.issued2024-12
dc.descriptionThesis (MSc)--Stellenbosch University, 2024.en_ZA
dc.description.abstractWe explore two approaches to improve machine translation for low-resource Arabic dialects: unsupervised domain adaptation and backtranslation. Arabic dialects exhibit distinct linguistic features, with some dialects being more similar to each other than others. Leveraging this characteristic, we demonstrate that a model trained on one group of dialects can effectively translate other dialects without additional labelled data. This approach leads to improved translation quality for all dialects and reduces the gap between distinct and similar dialects. Our proposed methodology involves initially training a neural machine translation model on various dialects using parallel corpora. Subsequently, fine-tuning is performed on unlabelled data of another dialect, where the translation model is jointly trained with an unsupervised domain adaptation discriminator. We also show that backtranslation can improve the performance of a base model, by generating synthetic parallel data and selecting sentences similar in domain to those in the existing parallel corpus. Domain cosine and domain fine-tune methods are deployed using different language models, to select data from the generated parallel data. Finally, we show that a multi-dialect model utilizing both unsupervised domain adaptation and backtranslation can outperform all other versions of our models and also those from the literature.en_ZA
dc.description.versionMastersen_ZA
dc.format.extent87 pagesen_ZA
dc.identifier.urihttps://scholar.sun.ac.za/handle/10019.1/131902
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.titleNeural machine translation for Arabic dialectsen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
salim_neural_2024.pdf
Size:
2.4 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.02 KB
Format:
Item-specific license agreed upon to submission
Description: