Static analysis of regular expressions

Weideman, Nicolaas Hendrik (2017-11-19)

Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)

Thesis (MSc)--Stellenbosch University, 2017

Thesis

ENGLISH ABSTRACT : Regular expressions are widely used throughout the programming community. In most cases, regular expressions allow for pattern matching tasks to be performed efficiently, but in some instances regular expression matching can be extremely slow. The exploit of the potential slowness of regular expression matching, is known as a regular expression denial of service attack. We investigate regular expression denial of service attacks, by approaching it from a computational complexity and automata theoretic point of view. A method for accurately modeling the matching time behaviour of a backtracking regular expression matcher, by using automata theoretic methods, is presented. We analyze our models by using the concept of ambiguity in nondeterministic finite-state automata. Our approach is evaluated on repositories of regular expressions often used in practice. Techniques for mitigating the vulnerability of backtracking regular expression matchers are investigated as a means to thwart regular expression denial of service attacks.

AFRIKAANSE OPSOMMING : Reguliere uitdrukkings word gereeld gebruik in die skryf van sagteware. In die meeste gevalle stel sulke uitdrukkings mens in staat om patroonherkenningsprobleme op ’n doeltreffende manier op te los. Daar is egter sommige situasies waar hierdie proses uiters tydrowend kan wees. Die uitbuiting van sulke kwesbaarhede staan as ’n diensontseggingaanval bekend. Ons ondersoek hierdie aanvalle vanuit die oogpunt van berekeningskompleksiteit en outomateteorie. ’n Metode word gegee om die herkenningstyd van ’n terugspoor herkenner van reguliere uitdrukkings akkuraat te modelleer. Ons analiseer die modelle deur gebruik te maak van die konsep van dubbelsinnigheid in nie-deterministiese eindigetoestand-outomate. Die metodes word getoets deur dit toe te pas op magasyne van reguliere uitdrukkings wat in die praktyk gebruik word. Tegnieke om die kwesbaarheid van terugspoor herkenners van reguliere uitdrukkings te verbeter word ondersoek, met die doelwit om diensontseggingaanvalle te voorkom.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/102879
This item appears in the following collections: