A Biscriptual Morphological Transducer for Crimean Tatar

Francis M. Tyers; Jonathan N. Washington; Darya Kavitskaya; Memduh Gökırmak; Nick Howell; Remziye Berberova

doi:10.33011/computel.v1i.423

Authors

Francis M. Tyers Indiana University
Jonathan N. Washington Swarthmore College
Darya Kavitskaya UC Berkeley
Memduh Gökırmak Univerzita Karlova in Prague
Nick Howell Higher School of Economics
Remziye Berberova Crimean Tavrida University

DOI:

https://doi.org/10.33011/computel.v1i.423

Abstract

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise.

Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer using the Latin orthography and then design a separate transliteration transducer to map the surface forms to Cyrillic. To help control the non-determinism in the orthographic mapping, we use weights to prioritise forms seen in the corpus. We perform an evaluation of all components of the system, finding an accuracy above 90% for morphological analysis and near 90% for orthographic conversion. This comprises the state of the art for Crimean Tatar morphological modelling, and, to our knowledge, is the first biscriptual single morphological transducer for any language.

A Biscriptual Morphological Transducer for Crimean Tatar

Authors

DOI:

Abstract

Downloads

Published

Issue

Section