Comment classification for an online news domain
Please cite as follows:
Brand, D. & Van Der Merwe, B. 2014. Comment classification for an online news domain, in Proceedings of the First International Conference on the use of Mobile Informations and Communication Technology (ICT) in Africa UMICTA 2014, 9-10 December 2014, STIAS Conference Centre, Stellenbosch: Stellenbosch University, Department of Electrical & Electronic Engineering, South Africa, ISBN: 978-0-7972-1533-7.
The conference is available at http://mtn.sun.ac.za/conference2014/
See also the record http://hdl.handle.net/10019.1/95703
In online discussion forums, comment moderation systems are often faced with the problem of establishing the value of an unseen online comment. By knowing the value of comments, the system is empowered to establish rank and to enhance the user experience. It is also useful for identifying malicious users that consistently show behaviour that is detrimental to the community. In this paper, we investigate and evaluate various machine learning techniques for automatic comment scoring. We derive a set of features that aim to capture various comment quality metrics (like relevance, informativeness and spelling) and compare it to content-based features. We investigate the correlation of these features against the community popularity of the comments. Through investigation of supervised learning techniques, we show that content-based features better serves as a predictor of popularity, while quality-based features are better suited for predicting user engagement. We also evaluate how well our classifier based rankings correlate to community preference.
- Collection D256