N-gram representations for comment filtering

Date
2015-09
Journal Title
Journal ISSN
Volume Title
Publisher
ACM, Inc.
Abstract
Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.
Description
CITATION: Brand, D., Kroon, S., Van der Merwe, B. & Cleophas, L. 2015. N-Gram Representations For Comment Filtering in Proceeding SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. STIAS, Wallenberg Centre, Stellenbosch, South Africa. 28-30 September 2015. doi:10.1145/2815782.2815789.
The original publication is available at http://dl.acm.org/authorize.cfm?key=N08849
SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. September 2015.
Keywords
N-gram models, Computational linguistics, Texts -- Electronic analysis, Online texts -- Classification, Information filtering systems, Vector spaces, Text mining
Citation
Brand, D., Kroon, S., Van der Merwe, B. & Cleophas, L. 2015. N-Gram Representations For Comment Filtering in Proceeding SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. STIAS, Wallenberg Centre, Stellenbosch, South Africa. 28-30 September 2015. doi:10.1145/2815782.2815789.