Date of Award
Doctor of Philosophy (PhD)
Sarel Van Vuuren
Rebecca Anne Scarborough
While many studies have been focused on pronunciation modeling for improving word recognition, limited efforts are made to provide insights into what pronunciation variations are at play and how they impact word error rate. This research provides a framework for diagnostic analysis of pronunciation variation for automatic speech recognition.
Previous work on pronunciation modeling has addressed lexicon adaptation and increasing of acoustic tolerance to allow pronunciation variation. However, cross-word variations and deletions have been neglected. An N-gram phoneme based pronunciation model is proposed here to provide a statistical approach for pronunciation variation modeling which differs from previous pronunciation models in its capability to model cross-word effects and deletions. Pronunciation analyses are conducted with the proposed model on three types of speech corpora: adults' spontaneous speech, children's spontaneous speech, and adults' read speech. Results obtained at the acoustic level and symbolic level (word/phoneme) demonstrate: how the automatic speech recognition system fails in word recognition with spontaneous speech, how different the three types of speech corpora are with respect to pronunciation variation, and the potential of a better word error rate from our N-gram based pronunciation model compared to the canonical pronunciation model.
Zheng, Jing, "Pronunciation Variation Modeling for Automatic Speech Recognition" (2014). Computer Science Graduate Theses & Dissertations. 90.