Date of Award

Spring 1-1-2014

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Wayne Ward

Second Advisor

Sarel Van Vuuren

Third Advisor

Rebecca Anne Scarborough

Fourth Advisor

Daniel Bolaños

Fifth Advisor

Martha Palmer

Abstract

While many studies have been focused on pronunciation modeling for improving word recognition, limited efforts are made to provide insights into what pronunciation variations are at play and how they impact word error rate. This research provides a framework for diagnostic analysis of pronunciation variation for automatic speech recognition.

Previous work on pronunciation modeling has addressed lexicon adaptation and increasing of acoustic tolerance to allow pronunciation variation. However, cross-word variations and deletions have been neglected. An N-gram phoneme based pronunciation model is proposed here to provide a statistical approach for pronunciation variation modeling which differs from previous pronunciation models in its capability to model cross-word effects and deletions. Pronunciation analyses are conducted with the proposed model on three types of speech corpora: adults' spontaneous speech, children's spontaneous speech, and adults' read speech. Results obtained at the acoustic level and symbolic level (word/phoneme) demonstrate: how the automatic speech recognition system fails in word recognition with spontaneous speech, how different the three types of speech corpora are with respect to pronunciation variation, and the potential of a better word error rate from our N-gram based pronunciation model compared to the canonical pronunciation model.

Share

COinS