N-gram representations for comment filtering
CITATION: Brand, D., Kroon, S., Van der Merwe, B. & Cleophas, L. 2015. N-Gram Representations For Comment Filtering in Proceeding SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. STIAS, Wallenberg Centre, Stellenbosch, South Africa. 28-30 September 2015. doi:10.1145/2815782.2815789.
The original publication is available at http://dl.acm.org/authorize.cfm?key=N08849
SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. September 2015.
Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.