What did they cover? : a cluster analysis of news stories published in the Botswana Daily News, January – December 2004
Thesis (MPhil (Information Science))--University of Stellenbosch, 2005.
ENGLISH ABSTRACT: In this study, a cluster analysis of news stories published in the Botswana Daily News during the period January - December 2004 was undertaken. The study was exploratory in nature and sought to find out what topics were predominant during the study period. The approach we adopted can be divided into three phases, namely data collection, document pre-processing, and cluster analysis. The data used in the study was downloaded from the Botswana Daily News website using a simple program developed specifically for that purpose. Document pre-processing was concerned with transforming the raw documents into a format that could be directly operated upon by the various clustering algorithms. The documents themselves were represented using the vector space model, with the tf.idf term weighting scheme. We experimented with three clustering approaches, namely, direct k-way clustering, k-way clustering through repeated bisections, and agglomerative clustering. Agglomerative clustering performed poorly, and we thus discarded its results. Direct k-way clustering and k-way clustering through repeated bisections produced similar results, though the former performed better in terms of external isolation and internal cohesion of the clusters produced. Consequently, we only retained the results from direct k-way clustering, and subsequently performed a quarterly analysis of our corpus using only the direct k-way clustering algorithm. Analysis of the complete corpus identified a number of topics that were prevalent over the study period. Interestingly, a quarterly analysis of the corpus revealed other topics whose prevalence appears to have been limited to certain parts of the year.