Browsing by Author "Cleophas, Loek"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- ItemAn assessment of algorithms for deriving failure deterministic finite automata(South African Institute of Computer Scientists and Information Technologists, 2017) Nxumalo, Madoda; Kourie, Derrick G.; Cleophas, Loek; Watson, Bruce W.Failure deterministic finite automata (FDFAs) represent regular languages more compactly than deterministic finite automata (DFAs). Four algorithms that convert arbitrary DFAs to language-equivalent FDFAs are empirically investigated. Three are concrete variants of a previously published abstract algorithm, the DFA-Homomorphic Algorithm (DHA). The fourth builds a maximal spanning tree from the DFA to derive what it calls a delayed input DFA. A first suite of test data consists of DFAs that recognise randomised sets of finite length keywords. Since the classical Aho-Corasick algorithm builds an optimal FDFA from such a set (and only from such a set), it provides benchmark FDFAs against which the performance of the general algorithms can be compared. A second suite of test data consists of random DFAs generated by a specially designed algorithm that also builds language-equivalent FDFAs, some of which may have non-divergent cycles. These random FDFAs provide (not necessarily tight) lower bounds for assessing the effectiveness of the four general FDFA generating algorithms.
- ItemN-gram representations for comment filtering(ACM, Inc., 2015-09) Brand, Dirk; Kroon, Steve; Van der Merwe, Brink; Cleophas, LoekAccurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.
- ItemA taxonomy of minimisation algorithms for deterministic tree automata(J.UCS Consortium, 2016) Bjorklund, Johanna; Cleophas, LoekWe present a taxonomy of algorithms for minimising deterministic bottomup tree automata (dtas) over ranked and ordered trees. Automata of this type and its extensions are used in many application areas, including natural language processing (nlp) and code generation. In practice, dtas can grow very large, but minimisation keeps things manageable. The proposed taxonomy serves as a unifying framework that makes algorithms accessible and comparable, and as a foundation for efficient implementation. Taxonomies of this type are also convenient for correctness and complexity analysis, as results can frequently be propagated through the hierarchy. The taxonomy described herein covers a broad spectrum of algorithms, ranging from novel to well-studied ones, with a focus on computational complexity.
- ItemTool support for correctness-by-construction(Springer, 2019) Runge, Tobias; Schaefer, Ina; Cleophas, Loek; Thum, Thomas; Kourie, Derrick; Watson, Bruce W.Correctness-by-Construction (CbC) is an approach to incrementally create formally correct programs guided by pre- and postcondition specifications. A program is created using refinement rules that guarantee the resulting implementation is correct with respect to the specification. Although CbC is supposed to lead to code with a low defect rate, it is not prevalent, especially because appropriate tool support is missing. To promote CbC, we provide tool support for CbC-based program development. We present CorC, a graphical and textual IDE to create programs in a simple while-language following the CbC approach. Starting with a specification, our open source tool supports CbC developers in refining a program by a sequence of refinement steps and in verifying the correctness of these refinement steps using the theorem prover KeY. We evaluated the tool with a set of standard examples on CbC where we reveal errors in the provided specification. The evaluation shows that our tool reduces the verification time in comparison to post-hoc verification.
- ItemWeak factor automata : the failure of failure factor oracles?(South African Institute of Computer Scientists and Information Technologists, 2014-08) Cleophas, Loek; Kourie, Derrick G.; Watson, Bruce W.In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.