Comparison of methods to calculate measures of inequality based on interval data

Date
2015-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: In recent decades, economists and sociologists have taken an increasing interest in the study of income attainment and income inequality. Many of these studies have used census data, but social surveys have also increasingly been utilised as sources for these analyses. In these surveys, respondents’ incomes are most often not measured in true amounts, but in categories of which the last category is open-ended. The reason is that income is seen as sensitive data and/or is sometimes difficult to reveal. Continuous data divided into categories is often more difficult to work with than ungrouped data. In this study, we compare different methods to convert grouped data to data where each observation has a specific value or point. For some methods, all the observations in an interval receive the same value; an example is the midpoint method, where all the observations in an interval are assigned the midpoint. Other methods include random methods, where each observation receives a random point between the lower and upper bound of the interval. For some methods, random and non-random, a distribution is fitted to the data and a value is calculated according to the distribution. The non-random methods that we use are the midpoint-, Pareto means- and lognormal means methods; the random methods are the random midpoint-, random Pareto- and random lognormal methods. Since our focus falls on income data, which usually follows a heavy-tailed distribution, we use the Pareto and lognormal distributions in our methods. The above-mentioned methods are applied to simulated and real datasets. The raw values of these datasets are known, and are categorised into intervals. These methods are then applied to the interval data to reconvert the interval data to point data. To test the effectiveness of these methods, we calculate some measures of inequality. The measures considered are the Gini coefficient, quintile share ratio (QSR), the Theil measure and the Atkinson measure. The estimated measures of inequality, calculated from each dataset obtained through these methods, are then compared to the true measures of inequality.
AFRIKAANSE OPSOMMING: Oor die afgelope dekades het ekonome en sosioloë ʼn toenemende belangstelling getoon in studies aangaande inkomsteverkryging en inkomste-ongelykheid. Baie van die studies maak gebruik van sensus data, maar die gebruik van sosiale opnames as bronne vir die ontledings het ook merkbaar toegeneem. In die opnames word die inkomste van ʼn persoon meestal in kategorieë aangedui waar die laaste interval oop is, in plaas van numeriese waardes. Die rede vir die kategorieë is dat inkomste data as sensitief beskou word en soms is dit ook moeilik om aan te dui. Kontinue data wat in kategorieë opgedeel is, is meeste van die tyd moeiliker om mee te werk as ongegroepeerde data. In dié studie word verskeie metodes vergelyk om gegroepeerde data om te skakel na data waar elke waarneming ʼn numeriese waarde het. Vir van die metodes word dieselfde waarde aan al die waarnemings in ʼn interval gegee, byvoorbeeld die ‘midpoint’ metode waar elke waarde die middelpunt van die interval verkry. Ander metodes is ewekansige metodes waar elke waarneming ʼn ewekansige waarde kry tussen die onder- en bogrens van die interval. Vir sommige van die metodes, ewekansig en nie-ewekansig, word ʼn verdeling oor die data gepas en ʼn waarde bereken volgens die verdeling. Die nie-ewekansige metodes wat gebruik word, is die ‘midpoint’, ‘Pareto means’ en ‘Lognormal means’ en die ewekansige metodes is die ‘random midpoint’, ‘random Pareto’ en ‘random lognormal’. Ons fokus is op inkomste data, wat gewoonlik ʼn swaar stertverdeling volg, en om hierdie rede maak ons gebruik van die Pareto en lognormaal verdelings in ons metodes. Al die metodes word toegepas op gesimuleerde en werklike datastelle. Die rou waardes van die datastelle is bekend en word in intervalle gekategoriseer. Die metodes word dan op die interval data toegepas om dit terug te skakel na data waar elke waarneming ʼn numeriese waardes het. Om die doeltreffendheid van die metodes te toets word ʼn paar maatstawwe van ongelykheid bereken. Die maatstawwe sluit in die Gini koeffisiënt, ‘quintile share ratio’ (QSR), die Theil en Atkinson maatstawwe. Die beraamde maatstawwe van ongelykheid, wat bereken is vanaf die datastelle verkry deur die metodes, word dan vergelyk met die ware maatstawwe van ongelykheid.
Description
Thesis (MComm)—Stellenbosch University, 2015.
Keywords
Interval data, UCTD, Income distribution -- Statistical methods
Citation