Static analysis of regular expressions

Date
2017-11-19
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT : Regular expressions are widely used throughout the programming community. In most cases, regular expressions allow for pattern matching tasks to be performed efficiently, but in some instances regular expression matching can be extremely slow. The exploit of the potential slowness of regular expression matching, is known as a regular expression denial of service attack. We investigate regular expression denial of service attacks, by approaching it from a computational complexity and automata theoretic point of view. A method for accurately modeling the matching time behaviour of a backtracking regular expression matcher, by using automata theoretic methods, is presented. We analyze our models by using the concept of ambiguity in nondeterministic finite-state automata. Our approach is evaluated on repositories of regular expressions often used in practice. Techniques for mitigating the vulnerability of backtracking regular expression matchers are investigated as a means to thwart regular expression denial of service attacks.
AFRIKAANSE OPSOMMING : Reguliere uitdrukkings word gereeld gebruik in die skryf van sagteware. In die meeste gevalle stel sulke uitdrukkings mens in staat om patroonherkenningsprobleme op ’n doeltreffende manier op te los. Daar is egter sommige situasies waar hierdie proses uiters tydrowend kan wees. Die uitbuiting van sulke kwesbaarhede staan as ’n diensontseggingaanval bekend. Ons ondersoek hierdie aanvalle vanuit die oogpunt van berekeningskompleksiteit en outomateteorie. ’n Metode word gegee om die herkenningstyd van ’n terugspoor herkenner van reguliere uitdrukkings akkuraat te modelleer. Ons analiseer die modelle deur gebruik te maak van die konsep van dubbelsinnigheid in nie-deterministiese eindigetoestand-outomate. Die metodes word getoets deur dit toe te pas op magasyne van reguliere uitdrukkings wat in die praktyk gebruik word. Tegnieke om die kwesbaarheid van terugspoor herkenners van reguliere uitdrukkings te verbeter word ondersoek, met die doelwit om diensontseggingaanvalle te voorkom.
Description
Thesis (MSc)--Stellenbosch University, 2017
Keywords
Computer programming language -- Regular expressions, UCTD, Automata theory, Computer programming language -- Denial-of-service attack (DoS attack), Computer programming language -- Backtracking matchers, Computer software -- Static analysis techniques
Citation