Automatic assignment of diagnosis codes to free-form text medical notes

dc.contributor.advisorVan der Merwe, Brinken_ZA
dc.contributor.authorStrydom, Stefanen_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.en_ZA
dc.date.accessioned2021-09-05T13:16:11Z
dc.date.accessioned2021-12-22T14:14:14Z
dc.date.available2021-09-05T13:16:11Z
dc.date.available2021-12-22T14:14:14Z
dc.date.issued2021-12
dc.descriptionThesis (MSc)--Stellenbosch University, 2021.en_ZA
dc.description.abstractENGLISH ABSTRACT: Clinical coding is the process of describing and categorising healthcare episodes according to standardised ontologies. The coded data have important downstream applications, including population morbidity studies, health systems planning and reimbursement. Clinical codes are generally assigned based on information contained in free-form text clinical notes by specialist human coders. This process is expensive, time-consuming, subject to human error and burdens scarce clinical human resources with administrative roles. An accurate automatic coding system can alleviate these problems. Clinical coding is a challenging task for machine learning systems. The source texts are often long, has a highly specialised vocabulary, contains non-standard clinician shorthand and the code sets can contain tens-of-thousands of codes. We review previous work on clinical auto-coding systems and perform an empirical analysis of widely used and current state-of-the-art machine learning approaches to the problem. We propose a novel attention mechanism that takes the text description of clinical codes into account. We also construct a small pre-trained transformer model that achieves state-of-the-art performance on the MIMIC II and III ICD-9 auto-coding tasks. To the best of our knowledge, it is the first successful application of a pre-trained transformer model on this task.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Kliniese kodering is die proses om gesondheidsorg-voorvalle volgens gestandaardiseerde ontologieë te beskryf en te kategoriseer. Die gekodeerde data het belangrike praktiese toepassings, insluitend studies omtrent die siektelas in die bevolking, gesondheidstelselbeplanning en regverdige vergoeding van medici. Kliniese kodes word gewoonlik toegeken deur klinies-opgeleide persone op grond van inligting vervat in vrye teks kliniese aantekeninge. Hierdie proses is duur, tydrowend, onderhewig aan menslike foute en belas skaars kliniese menslike hulpbronne met administratiewe rolle. ’n Akkurate outomatiese koderingstelsel kan help om hierdie probleme te verlig. Kliniese kodering is ’n uitdagende taak vir masjienleerstelsels. Die kliniese teks is dikwels lank, het ’n gespesialiseerde woordeskat, bevat nie-standaard kliniese snelskrif en die kodestelle kan tienduisende kodes bevat. Ons ondersoek vorige werk oor kliniese outokoderingstelsels en voer ’n empiriese analise uit van die mees algemene en beste-in-klas masjienleerbenaderings tot die probleem. Ons stel ’n nuwe aandagmeganisme voor wat die teksbeskrywing van kliniese kodes tydens klassifikasie in ag neem. Ons konstrueer ook ’n klein voorafopgeleide transformatormodel wat huidige maatstawwe vir die MIMIC II and III ICD-9 outokoderingstake oortref. Na ons beste wete is dit die eerste suksesvolle toepassing van ’n vooraf opgeleide transformatormodel vir hierdie taak.af_ZA
dc.description.versionMastersen_ZA
dc.format.extentxii, 102 pagesen_ZA
dc.identifier.urihttp://hdl.handle.net/10019.1/123654
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.subjectClinical auto-coding systemsen_ZA
dc.subjectMachine learningen_ZA
dc.subjectDiagnosis related groups -- Automationen_ZA
dc.subjectMedical codes -- Automatic controlen_ZA
dc.subjectUCTD
dc.titleAutomatic assignment of diagnosis codes to free-form text medical notesen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
strydom_automatic_2021.pdf
Size:
1.86 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: