Browsing by Author "Theron, Pieter Zacharias"
Now showing 1 - 1 of 1
Results Per Page
- ItemAutomatic acquisition of two-level morphological rules(Stellenbosch : Stellenbosch University, 1999-02) Theron, Pieter Zacharias; Cloete, Ian; Stellenbosch University. Faculty of Sciences. Dept. of Mathematical Sciences.ENGLISH SUMMARY: There are numerous applications for computational systems with a natural language processing capability. All these applications, which include free-text information retrieval, machine-translation and computer-assisted language learning, require a detailed and correctly structured database (or lexicon) of language information on all the levels of language analysis (phonology, morphology, syntax, semantics, etc.). To hand-code this information can be time-consuming and error prone. An alternative approach is to attempt the automation of the lexicon construction process. The contribution of this thesis is to present a method to automatically construct rule sets for the morphological and phonological levels of language analysis. The particular computational morphological framework used is that of two-level morphology. The lexicon, which contains the language specific information of two-level analyzers/ generators, consists of two components: (1) A morphotactic description of the words to be processed, as well as (2) a set of two-level phonological (or spelling) rules. The input to the acquisition process is source-target word pairs, where the target is an inflected form of the source word. It is assumed that the target word is formed from the source through the optional addition of a prefix and/or a suffix. There are two phases in the acquisition process: (1) segmentation of the target into morphemes and (2) determination of the optimal two-level rule set with minimal discerning contexts. In phase one, an acyclic deterministic finite state automaton (ADFSA) is constructed from string edit sequences of the input pairs. Segmentation of the words into morphemes is achieved through viewing the ADFSA as a directed acyclic graph (DAG) and applying heuristics using properties of the DAG as well as the elementary string edit operations. For phase two, the determination of the optimal rule set is made possible with a novel representation of rule contexts, with morpheme boundaries added, in a new DAG. We introduce the notion of a delimiter edge. Delimiter edges are used to select the correct two-level rule type as well as to extract minimal discerning rule contexts from the DAG. To illustrate the language independence of an acquired rule set, results are presented for English adjectives, Xhosa noun locatives, Afrikaans noun plurals and Spanish adjectives. Furthermore, it is shown how rules are acquired from thousands of input source target word pairs. Finally, the excellent generalization of an acquired rule set is shown by applying a slightly manually modified rule set to previously unseen words. The recognition accuracy on unseen words was 98.9% while the generation accuracy was 97.8%.