Conference Proceedings (Computer Science)

Permanent URI for this collection

https://scholar.sun.ac.za/handle/10019.1/96337

Browse

Now showing 1 - 6 of 6

Comment classification for an online news domain
(2014-12) Brand, Dirk; Van der Merwe, Brink
ENGLISH ABSTRACT: In online discussion forums, comment moderation systems are often faced with the problem of establishing the value of an unseen online comment. By knowing the value of comments, the system is empowered to establish rank and to enhance the user experience. It is also useful for identifying malicious users that consistently show behaviour that is detrimental to the community. In this paper, we investigate and evaluate various machine learning techniques for automatic comment scoring. We derive a set of features that aim to capture various comment quality metrics (like relevance, informativeness and spelling) and compare it to content-based features. We investigate the correlation of these features against the community popularity of the comments. Through investigation of supervised learning techniques, we show that content-based features better serves as a predictor of popularity, while quality-based features are better suited for predicting user engagement. We also evaluate how well our classifier based rankings correlate to community preference.
Firegaze : processing and visualizing firewall logs in the cloud
(SATNAC, 2013) Van Tonder, R.; Visser, W.
This project aims to visualise packet counts filtered by iptables at the network layer, and allows for performing network forensics in a distributed environment. For example, anomalies such as bandwidth spikes and port scans are exposed and quickly identifiable. Naturally, there are a host of tools which already perform this function. The twist with this project is that it should operate on a scalable cloud infrastructure—Nimbula Director is used as a test bed to this end. Intrusion Detection Systems and full-blown Security Information and Event Management (SIEM) solutions have their merits but are often too bulky. Cloud infrastructures rely principally on correctly configured firewalls for network-layer security. As such, Firegaze is a prototype solution which serves as a supplement to network layer security by visualizing firewall activity; it does not perform any analysis, but rather leaves it up to the system administrator to identify anomalous activity. Typically, log files are only needed once an incident occurs, or in the event of system failure. The idea behind Firegaze was to provide a solution for visualizing iptables logs in real-time, or on a historical basis. The challenge of doing this in an environment which scales has influenced the implementation greatly; logs are propagated among nodes in a hierarchical manner, and logs are inserted into a sharded MongoDB database according to a pre-aggregated reports pattern.
Flow allocation in wireless networks with selfish nodes
(SATNAC, 2013) Krzesinski, A. E.
Consider an ad hoc network where packet transmissions occur between the nodes. Optimal flow allocation in such systems can be modelled as a constrained nonlinear optimisation problem. This problem can be solved either by standard methods which assume global knowledge of the system being modelled, or by a distributed algorithm which assumes local knowledge only. We consider an ad hoc network which contains selfish nodes. A selfish node cares only about maximising its own flows and does not care about the utility that any other nodes get. Flow allocation in a network of altruistic and selfish nodes can be modelled as a constrained nonlinear optimisation problem and solved by standard methods. However, in this case a dynamic algorithm to compute the network flows in not available. We modify the behaviour of the selfish nodes so that a dynamic solution is possible. In this scheme, selfish nodes advertise false (inflated) resource prices to the other nodes. These nodes respond by not routing their flows through the selfish nodes, and the selfish nodes can now can use all their resources to transmit their own flows. The flows return to their optimal values if the selfish nodes subsequently advertise the correct prices for their resources. Altruistic nodes can detect the inflated prices charged by the selfish nodes and respond by advertising false (inflated) prices to the selfish nodes. In this case the flows originating at the selfish nodes are reduced, but the flows do not return to their optimal values. This scheme also has a distributed solution.
Learning dynamics of linear denoising autoencoders
(PMLR, 2018) Pretorius, Arnu; Kroon, Steve; Kamper, Herman
Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still lacking of how the input noise influences learning. Here we develop theory for how noise influences learning in DAEs. By focusing on linear DAEs, we are able to derive analytic expressions that exactly describe their learning dynamics. We verify our theoretical predictions with simulations as well as experiments on MNIST and CIFAR-10. The theory illustrates how, when tuned correctly, noise allows DAEs to ignore low variance directions in the inputs while learning to reconstruct them. Furthermore, in a comparison of the learning dynamics of DAEs to standard regularised autoencoders, we show that noise has a similar regularisation effect to weight decay, but with faster training dynamics. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.
N-gram representations for comment filtering
(ACM, Inc., 2015-09) Brand, Dirk; Kroon, Steve; Van der Merwe, Brink; Cleophas, Loek
Accurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.
Unsupervised pre-training for fully convolutional neural networks
(Institute of Electrical and Electronics Engineers, 2016) Wiehman, Stiaan; Kroon, Steve; De Villiers, Hendrik
Unsupervised pre-training of neural networks has been shown to act as a regularization technique, improving performance and reducing model variance. Recently, fully con-volutional networks (FCNs) have shown state-of-the-art results on various semantic segmentation tasks. Unfortunately, there is no efficient approach available for FCNs to benefit from unsupervised pre-training. Given the unique property of FCNs to output segmentation maps, we explore a novel variation of unsupervised pre-training specifically designed for FCNs. We extend an existing FCN, called U-net, to facilitate end-to-end unsupervised pre-training and apply it on the ISBI 2012 EM segmentation challenge data set. We performed a battery of significance tests for both equality of means and equality of variance, and show that our results are consistent with previous work on unsupervised pre-training obtained from much smaller networks. We conclude that end-to-end unsupervised pre-training for FCNs adds robustness to random initialization, thus reducing model variance.

Browse

Browsing Conference Proceedings (Computer Science) by Title

Results Per Page

Sort Options