Browsing by Author "Adams, Zoë-Mae"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemSentiment classification and an approach to sentiment visualisation(Stellenbosch : Stellenbosch University, 2022-12) Adams, Zoë-Mae; Nienkemper-Swanepoel, Johané; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: The social media platform, Twitter, presents a great amount of text data regarding social interactions from the Tweets posted by users. The user-generated text data contains opinions and sentiments that are considered to be biased towards the users’ individual and community experiences. In this study, text data related to the COVID-19 pandemic is procured from Twitter. The Tweets are utilised in two respective case studies. The first case study uses Tweets posted from three South African cities and the second case study uses Tweets posted from three countries. The selected cities are Cape Town, Durban and Johannesburg. The selected countries are South Africa, Australia and the United Kingdom. The subjective nature of the text leads to the use of sentiment classification to gain insight from the observed text data as well as expose the meaning and context. Sentiment classification entails matching the pre-processed text (i.e. text elements) to terms and phrases in a sentiment lexicon to determine their sentiment polarities. This study considers two sentiment lexicons: Bing and AFINN. Sentiment visualisation is concerned with summarising the content and underlying meaning within the text as well as displaying the distinct sentiments. This study explores and enhances two existing text visualisation tools: word clouds and multiple correspondence analysis (MCA) biplots. These visualisations are used to analyse the content and gauge the underlying sentiment within the text. Word clouds provide an overview of the occurrences of words in a given context. The word clouds are systematically enhanced by colour coding words according to their associated sentiment categories to reveal not only the most relevant topics in the text, but also the overall sentiment. In order to evaluate the dominant sentiment of the text, the word clouds are further enhanced to only display the words that are matched to the Bing sentiment lexicon. Considering that fear and uncertainty were identified as relevant topics related to the pandemic, the overall sentiment within the Tweets is reflected as negative. The sentiment classification results along with additional relevant categorical variables are compiled into a categorical dataset suitable for MCA and biplot visualisation. In its simplest form, a biplot is regarded as a generalised scatterplot which allows the visualisation of observations on more than two variables simultaneously. In this study, the MCA biplot will enable the investigation of the relationships among the Tweets and the levels of the categorical variables. The proximity of points in the biplot display suggests similar response profiles and associations between the category levels under investigation. The categorical variables considered in the case studies, include the location the Tweet was sent from, the overall sentiment categories per Tweet and the number of words in each Tweet classifiable by the sentiment lexicon. The standard MCA biplot is enhanced through word embedding which additionally displays the classifiable words along with the levels of the categorical variables. The number of words considered for classification is found to influence the overall sentiment classification of the Tweet. The embedded word MCA biplot confirmed the consistency of the sentiment classification through the close proximity of category levels representing similar sentiment scores. Words with similar sentiment are also located in close proximity which eases the interpretation of the underlying meaning of the Tweets. Overall, the biplots reveal that the number of words influence the strength of the sentiment classification, seeing that a larger number of classifiable words in the Tweets is more likely to lead to a neutral sentiment due to the averaging of sentiment scores to determine the overall sentiment of a particular Tweet. The methodology enables the visualisation of a quantified measure of sentiment along with the associated words. These promising results therefore add to the developing field of sentiment visualisation through the enhancement of existing text visualisation tools to visualise sentiments within the text.