Finding Sami Cognates with a Character-Based NMT Approach

Mika Hämäläinen; Jack Reuter

doi:10.33011/computel.v1i.395

Authors

Mika Hämäläinen University of Helsinki
Jack Reuter University of Helsinki

DOI:

https://doi.org/10.33011/computel.v1i.395

Abstract

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.

Finding Sami Cognates with a Character-Based NMT Approach

Authors

DOI:

Abstract

Downloads

Published

Issue

Section