Required cookies

This website uses cookies necessary for its operation in order to provide the user with content and certain functionalities (e.g. language selection). You have no control over the use of these cookies.

Website visitor statistics

We collect visitor statistics on the use of the site. The data is not personally identifiable and is only stored in the Matomo visitor analytics tool managed by CSC.

By accepting visitor statistics, you allow Matomo to use various technologies, such as analytics cookies and web beacons, to collect statistics about your use of the site.

Change your cookie choices and read more about visitor statistics and cookies

CSC

In total, the university’s machine translation engine supports 23 Finno-Ugric languages: in addition to the more commonly supported Estonian, Finnish and Hungarian, it now includes Livonian, Võro, Proper Karelian, Livvi Karelian, Ludian, Veps, Northern Sami, Southern Sami, Inari Sami, Skolt Sami, Lule Sami, Komi, Komi-Permyak, Udmurt, Hill Mari and Meadow Mari, Erzya, Moksha, Mansi and Khanty.

Most of these languages are added to a public translation engine for the first time, as they are not part of Google Translate or similar services.

The University of Tartu NLP researchers, on the picture from left to right: Research Fellow Lisa Yankovskaya, Professor Mark Fišel and Scientific Programmer Maali Tars. Image: Henry Narits.

“We started working with Finno-Ugric languages in 2021, with the first system supporting Võro, Northern Sami and Southern Sami,” said Maali Tars, Scientific Programmer at the Institute of Computer Science at the University of Tartu.

According to her, they added Livonian to the machine translation engine the same year. Livonian is an extremely endangered language with just about 20 near-native speakers.

In the future, the researchers will continue to improve the quality of the current machine translation system and intend to include more Finno-Ugric languages and dialects.

“There are several reasons for developing machine translation for low-resource languages. For example, philologists and other interested parties need the translation from these languages to understand texts, folklore, etc., without learning the language. Translating into these languages is a way of preserving endangered languages and supporting the speakers, said Lisa Yankovskaya,” Research Fellow in Natural Language Processing at the University of Tartu.

She added that this is why the translation system is unrestricted and open to all users, and the software and created models are open-source.

Improving translation quality

The research group invites the speakers and researchers of these languages to contribute to corrected translations to improve translation quality. This can be done by editing translations at translate.ut.ee. Texts like poems, articles, books and other textual content in these languages are also of great help and can be sent to ping@tartunlp.ai.

Yankovskaya explained that feedback is needed to improve the translation quality because many of these languages have extremely scarce resources for creating such translation systems.

“This means two things: first, the translation quality can vary a lot, and it can be especially low when translating into low-resourced languages. Second, we need the help of speakers of those languages by having them contribute correct translations on our platform,” noted Yankovskaya.

This collaboration was done with the Livonian Institute at the University of Latvia, Võro Institute, the University of Eastern Finland, the Karelian language revitalization programme of the University of Eastern Finland and the Arctic University of Norway.

The work is funded by the National Programme of Estonian Language Technology.

You can find the machine translation engine Neurotõlge here.

Author: Henry Narits, the University of Tartu, Estonia