Automatic assignment of diagnosis codes to free-form text medical notes

Date
2021-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Clinical coding is the process of describing and categorising healthcare episodes according to standardised ontologies. The coded data have important downstream applications, including population morbidity studies, health systems planning and reimbursement. Clinical codes are generally assigned based on information contained in free-form text clinical notes by specialist human coders. This process is expensive, time-consuming, subject to human error and burdens scarce clinical human resources with administrative roles. An accurate automatic coding system can alleviate these problems. Clinical coding is a challenging task for machine learning systems. The source texts are often long, has a highly specialised vocabulary, contains non-standard clinician shorthand and the code sets can contain tens-of-thousands of codes. We review previous work on clinical auto-coding systems and perform an empirical analysis of widely used and current state-of-the-art machine learning approaches to the problem. We propose a novel attention mechanism that takes the text description of clinical codes into account. We also construct a small pre-trained transformer model that achieves state-of-the-art performance on the MIMIC II and III ICD-9 auto-coding tasks. To the best of our knowledge, it is the first successful application of a pre-trained transformer model on this task.
AFRIKAANSE OPSOMMING: Kliniese kodering is die proses om gesondheidsorg-voorvalle volgens gestandaardiseerde ontologieë te beskryf en te kategoriseer. Die gekodeerde data het belangrike praktiese toepassings, insluitend studies omtrent die siektelas in die bevolking, gesondheidstelselbeplanning en regverdige vergoeding van medici. Kliniese kodes word gewoonlik toegeken deur klinies-opgeleide persone op grond van inligting vervat in vrye teks kliniese aantekeninge. Hierdie proses is duur, tydrowend, onderhewig aan menslike foute en belas skaars kliniese menslike hulpbronne met administratiewe rolle. ’n Akkurate outomatiese koderingstelsel kan help om hierdie probleme te verlig. Kliniese kodering is ’n uitdagende taak vir masjienleerstelsels. Die kliniese teks is dikwels lank, het ’n gespesialiseerde woordeskat, bevat nie-standaard kliniese snelskrif en die kodestelle kan tienduisende kodes bevat. Ons ondersoek vorige werk oor kliniese outokoderingstelsels en voer ’n empiriese analise uit van die mees algemene en beste-in-klas masjienleerbenaderings tot die probleem. Ons stel ’n nuwe aandagmeganisme voor wat die teksbeskrywing van kliniese kodes tydens klassifikasie in ag neem. Ons konstrueer ook ’n klein voorafopgeleide transformatormodel wat huidige maatstawwe vir die MIMIC II and III ICD-9 outokoderingstake oortref. Na ons beste wete is dit die eerste suksesvolle toepassing van ’n vooraf opgeleide transformatormodel vir hierdie taak.
Description
Thesis (MSc)--Stellenbosch University, 2021.
Keywords
Clinical auto-coding systems, Machine learning, Diagnosis related groups -- Automation, Medical codes -- Automatic control, UCTD
Citation