An HMM-based automatic singing transcription platform for a sight-singing tutor
Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008.
A singing transcription system transforming acoustic input into MIDI note sequences is presented. The transcription system is incorporated into a pronunciation-independent sight-singing tutor system, which provides note-level feedback on the accuracy with which each note in a sequence has been sung. Notes are individually modeled with hidden Markov models (HMMs) using untuned pitch and delta-pitch as feature vectors. A database consisting of annotated passages sung by 26 soprano subjects was compiled for the development of the system, since no existing data was available. Various techniques that allow efficient use of a limited dataset are proposed and evaluated. Several HMM topologies are also compared, in analogy with approaches often used in the field of automatic speech recognition. Context-independent note models are evaluated first, followed by the use of explicit transition models to better identify boundaries between notes. A non-repetitive grammar is used to reduce the number of insertions. Context-dependent note models are then introduced, followed by context-dependent transition models. The aim in introducing context-dependency is to improve transition region modeling, which in turn should increase note transcription accuracy, but also improve the time-alignment of the notes and the transition regions. The final system is found to be able to transcribe sung passages with around 86% accuracy. Finally, a note-level sight-singing tutor system based on the singing transcription system is presented and a number of note sequence scoring approaches are evaluated.