An RFI simulation pipeline to help teach interferometry and machine learning

Agbetsiafa, Insight Enya Aku (2022-04)

Thesis (MEng)--Stellenbosch University, 2022.

Thesis

ENGLISH ABSTRACT: An interferometer is a collection of radio antennas that together form one instrument. Machine Learning is the collective term that is used to refer to a set of algorithms that can automatically learn to perform a specific task if it is provided with training examples. Interferometry has become an intricate part of the scientific landscape in South Africa with the advent of MeerKAT. Similarly, utilizing Machine Learning (ML to improve our lives has grown in popularity worldwide. Machine Learning is nowadays used to determine the likes of people, to interpret human utterings, to automatically classify images and the like. As these two fields grow in popularity and importance within the South African context, so does the development of tools that can aid in teaching these fields to undergraduate students. A major problem for radio observatories worldwide is Radio Frequency Interference (RFI. RFI can be detected using ML. A simulator that can simulate interferometric observations that are corrupted by RFI can serve as a testbed for different ML approaches. Moreover, if the simulator is simplistic enough it can even be utilized as a teaching tool. In this thesis such a simulator is developed. This simulator can aid in teaching students how visibilities can be simulated and how RFI can be detected via ML. In effect, one tool that can help teach two relevant undergraduate topics, namely interferometry and ML. In particular, an experiment is proposed which an undergraduate student can repeat to gain a deeper understanding of interferometry and ML. In this experiment, visibilities are simulated, RFI is injected and detected using four different ML techniques, namely Naive Bayes, Logistic Regression, k-means and Gaussian Mixture Models (GMM). The results are then analysed and conclusions are drawn. For the simplistic setup considered here, the ranking of the four algorithms is from best to worst: Naive Bayes, Logistic Regression, GMM and then k-means. In the future, if the simulator is extended somewhat, it can also be used as a testbed for comparing numerous other ML algorithms. The thesis also provides a comprehensive review of all the theory that a student requires to master both interferometry and ML.

AFRIKAANSE OPSOMMING: 'n Interferometer is 'n versameling van radio antennas wat saam een instrument vorm. Masjienleer is die kollektiewe term wat grebruik word om te verwys na 'n stel algoritmes wat automaties kan leer hoe om 'n spesifieke funksies te verrig, gegee afrigtingsvoorbeelde. Interferometrie, het 'n belangrike deel van die wetenskaplike landskap in Suid-Afrika geword met die loots van MeerKAT. Soortgelyk, masjienleer se gebruik het wˆereldwyd drasties gegroei. Masjienleer word deesdae gebruik om die voorkeure van mense te bepaal, om die woorde wat mense uiter te herken, om prentjies te klassifiseer en dies meer. Soos wat die twee velde se gewildheid groei, word dit al hoe meer belangrik om toepassings te ontwikkel wat gebruik kan word om te help om die twee velde aan voorgraadse studente te verduidelik. 'n Groot probleem wat radio-sterrewagte in die gesig staar is Radio Frekwensie Inmenging (RFI. RFI kan met behulp van masjienleer geïdentifiseer word. 'n Simulator wat sigbaarheidsmetings kan genereer wat besmet is met RFI kan gebruik word om verkillende masjienleer tegnieke met mekaar te vergelyk. Verder, as 'n simulator eenvoudig genoeg is, kan dit ook gebruik word as 'n onderrigstoepassing. In hierdie tesis word so 'n simulator ontwikkel. Die simulator kan gebruik word om beide, interferometrie en masjienleer, aan studente te verduidelik. Meer spesifiek, 'n eksperiment word voorgestel wat studente sal kan herhaal. In die eksperiment word sigbaarhaeidsmetings gegenereer wat vermeng word met RFI. Vier masjienleer algoritmes word dan gebruik om die RFI te identi seer. Die vier algoritmes is: Naïewe Bayes, Lo gistiese Regressie, Gausiese Mengsel Modelle (GMM) en k-gemdideldes. Die akkuraatheidsrangorde van die vier algoritmes, soos in die studie bevind, is dieselfde as wat hier gegee is. As die simulator uitgebrei word kan dit ook gebruik word om verkeie ander masjienleeralgoritmes met mekaar te vergelyk. Die tesis bevat ook 'n oorsig van al die teorie wat 'n student sou kon help om beide velde te bemeester.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/124527
This item appears in the following collections: