This paper describes a system that improves automatic ARPABET transcription by addressing performance issues resulting from Arabic and Russian transliteration in English text. Our system is called EAR (English, Arabic, Russian). The EAR system has two components: 1. An n-gram language identifier module which classifies an incoming unknown word as Arabic, Russian, or English, 2. Language specific letter to sound rules which output a pronunciation for a word based on its classification. Our results show overall system error reduction rates at upwards of 45% as compared to a system trained only on English.
Lewis, Stephen; McGrath, Katie; and Reuppel, Jeffrey
"Language Identification and Language Specific Letter-to-Sound Rules,"
Colorado Research in Linguistics: Vol. 17.
Available at: https://scholar.colorado.edu/cril/vol17/iss1/6