N-gram representations for comment filtering

Brand, Dirk ; Kroon, Steve ; Van Der Merwe, Brink ; Cleophas, Loek (2015-09)

CITATION: Brand, D., Kroon, S., Van der Merwe, B. & Cleophas, L. 2015. N-Gram Representations For Comment Filtering in Proceeding SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. STIAS, Wallenberg Centre, Stellenbosch, South Africa. 28-30 September 2015. doi:10.1145/2815782.2815789.

The original publication is available at http://dl.acm.org/authorize.cfm?key=N08849

SAICSIT '15. Proceedings of the 2015 Annual Research Conference on South African Institute of Computer Scientists and Information Technologists, Article No. 6. September 2015.

Conference Paper

Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/98228
This item appears in the following collections: