Investigating fatal road accident data

Van Niekerk, Andri (2007-03)

Thesis (MScIng) -- University of Stellenbosch, 2007.

Thesis

ENGLISH ABSTRACT: This thesis concerns the investigation of four analyses techniques in terms of their utility and adequacy for analyzing fatal road accident data in South Africa. PROBLEM DEFINITION: Road accident data are summarized annually in various forms, but the relationships between the different categorical variables are not determined. The study aimed to address this problem. Road accident rates are published in order to compare year-to-year change in an accident rate. It was necessary to investigate a method to determine whether these year-to-year changes are statistically significant and whether there should necessarily be a reason for concern when an increase in accident rate is detected. Multiple regression models also including qualitative variables were investigated in this study. ACCIDENT DATA AND ANALYSIS TECHNIQUES: Road accident data were found available in the format of a MS Access database which could be manually investigated. Traffic and speed data were readily available from Mikros Traffic Monitoring (Pty) Ltd in the form of SANRAL's CTO Yearbooks and was found to be reliable and sufficiently detailed. Any road geometric data were omitted from the study due to insufficient detail available. All data were found to show levels of poor data quality. Certain variables were thus omitted from the study e.g. the age group variable. The fatal road accident database was analysed using Correspondence Analysis and Association Rules (for analyses of the categorical variables) and, the application of the Poisson distribution for chance variation analyses and Multiple Regression Analyses (for the continuous variables). METHODOLOGY: Fatal road accident data were gathered by performing queries in the fatal road accident database. Traffic and Speed data were gathered by manually investigating the SANRAL CTO Yearbooks and manipulating the data to be integrated with the fatal road accident database. After all data manipulation was completed, the four analyses techniques mentioned above were applied using the software package Statistica. FINDINGS: Correspondence Analysis and Association Rules were found to be adequate for analysing categorical road accident data variables with some data quality limitations and insufficient data sampling. The time period used for chance variation analysis was too short to deliver significant results. Three multiple regression models were created with one of the models being able to predict the number of fatalities per fatal accident with k equal to approximately 40%. CONCLUSIONS AND RECOMMENDATIONS: The following conclusions are drawn and recommendations are made based on the findings of this study: ~ Detailed and quality road accident data for South Africa is unavailable. Better quality data are urgently needed for the purpose of analysis. ~ Correspondence Analysis is found to be the most appropriate technique for road accident data analysis and should be applied on an annual basis. ~ Association Rules Analysis results are influenced by small sample sizes and too many unknown variable categories. Larger sample sizes and exclusion of the unknown categories might improve the results. ~ The analysis period for chance variation is too short and a longer period will provide more significant results. ~ The multiple regression model predicting the number of fatalities per fatal accident is accepted in terms of utility and adequacy.

AFRIKAANSE OPSOMMING: Hierdie tesis bespreek die toepassing van vier verskillende analise tegnieke in terme van elkeen se geskiktheid om noodlottige padongelukke in Suid-Afrika te ondersoek. PROBLEEM DEFINISIE: Padongeluk data word jaarliks opgesom en publiseer in verskillende vorme, maar die verwantskappe tussen kategoriese veranderlikes word nie direk bepaal nie. Die stude het probeer om hierdie probleem aan te spreek. Padongeluk koerse word gepubliseer om verandering in ongeluk syfers waar te neem van jaar tot jaar. Dit was nodig om 'n metode te ondersoek om te bepaal wanneer enige verandering in ongelukskoerse statisties betekenisvol is en of daar noodwendig rede vir kommer behoort te wees indien 'n toename in ongelukskoers waargeneem is. Veelvoudige regress1e modelle wat ook kwalitatiewe veranderlikes insluit is ondersoek m hierdie studie. ONGELUKSDATA EN ANALISE TEGNIEKE: Padongeluksdata was beskikbaar in 'n MS Access dokument wat met die hand ondersoek kon word. Verkeers- en Spoed data was beskikbaar van Mikros Traffic Monitoring (Edms.) Bpk. vanuit SANRAL se CTO Yearbooks. Die data was betroubaar en beskikbaar en in voldoende detail. Geometriese inligting van die betrokke padseksies is uitgesluit by die studie a.g.v. onvoldoende detail beskikbaar. Alle data ingesamel het verskeie vlakke van lae data kwaliteit getoon. Sekere veranderlikes is daarom uitgesluit, bv. die ouderdomsveranderlike . Die noodlottige padongeluk databasis is geanaliseer deur die gebruik van Ooreenkomsanalise en Assosiasie Reels (vir die kategoriese veranderlikes) en die toepassing van die Poisson verspreiding vir ewekansige variasie en Veelvoudige Regressie Analise (vir die kontinue veranderlikes). METODIEK: Noodlottige padongeluk data is ingesamel deur queries uit te voer in die MS Access databasis. Verkeers- en Spoed data is ingesamel deur die CTO Yearbooks van SANRAL met die hand te ondersoek en die data te integreer met die ongelukdatabasis. Nadat alle relevante data met die ongeluksdatabasis geintegreer is, is die vier bovermelde analise tegnieke uitgevoer m.b.v. die sagteware pakket Statistica. BEVINDINGS: Ooreenkomsanalise en Assosiasie Reels is die mees geskikte analise tegnieke VJI kategoriese veranderlikes, alhoewel relatief lae data kwaliteit en onvoldoende steekproef trekking beperkings daar gestel het Die analise periode wat gebruik is vir ewekansige variasie is te kort om statisties betekenisvolle resultate te lewer. Drie meervoudige regressie modelle is opgestel. Dit is bevind dat een van die modelle die aantal noodlottige gevalle per noodlottige padongeluk met ' n K-waarde van ongeveer 40% voorspel GEVOLGTREKKINGS EN AANBEVELINGS: Die volgende gevolgtrekkings en aanbevelings word gemaak volgens die bevindings van hierdie studie: ~ Gedetailleerde en kwaliteit padongeluksdata vir Suid-Afrika 1s rue beskikbaar nie. Beter data kwaliteit word dringend benodig vir analise doeleindes. ~ Ooreenkomsanalise is die mees geskikte analise tegniek vir padongeluk data analise en behoort jaarliks toegepas te word. ~ Assosiasie Reels resultate word grootliks bemvloed deur klein steekproef groottes en te veel onbekende veranderlike kategoriee. Groter steekproewe en die uitsluit van onbekende kategoriee mag die resultate verbeter. ~ Die analise tydperk vir ewekansige variasie analise is te kort en ' n !anger tydperk sal meer betekenisvolle resultate !ewer. ~ Die meervoudige regressie model wat die aantal noodlottige gevalle per noodlottige ongeluk voorspel word aanvaar in terme van sy bruikbaarheid en geskiktheid.

Please refer to this item in SUNScholar by using the following persistent URL: http://hdl.handle.net/10019.1/50724
This item appears in the following collections: