MASRI - Maltese Automatic Speech Recognition

MASRI is developing the first Automatic Speech Recognition technologies for Maltese.

Il-proġett MASRI qed jiżviluppa t-teknoloġiji għat-traskrizzjoni awtomatika bil-Malti.

MASRI

We are interested in a broad variety of problems in Speech Recognition for under-resourced languages, especially Maltese.

Aħna interessati f'diversi sfidi konnessi mat-traskrizzjoni awtomatika b'lingwi bi ftit riżorsi, speċjalment il-Malti.

Support | Fondi

MASRI is supported by a University of Malta Research Fund Excellence grant.
MASRI għandu l-fondi mill-Fond tar-Riċerka tal-Università ta' Malta.

Data

We are creating speech corpora and investigating data augmentation techniques.
Qed noħolqu korpora tat-taħdit u ninvestigaw mezzi biex nawmentaw id-data.

Technologies | Technoloġiji

We investigate neural techniques for speech-to-text, forced alignment etc.
Qed ninvestigaw teknoloġija bbażata fuq xbiek newrali għat-traskrizzjoni awtomatika, forced alignment, eċċ.

Neħtieġu l-għajnuna tiegħek

Biex nibnu sistemi li jużaw il-vuċi aħjar neħtieġu għadd kbir ta' recordings ta' taħdit bil-Malti. Għaldaqstant, qed nużaw il-Common Voice, proġett tal-Mozilla mfassal sabiex jgħin fil-ħolqien ta' sistemi tat-taħdit għal-lingwi kollha mitkellma madwar id-dinja. Permezz ta' dan, qed niġbru kampjuni ta' taħdit minn kelliema madwar Malta u Għawdex.

  • Agħfas il-buttuna biex iżżur il-paġna ta' Common Voice.
  • Qis li tagħżel il-Malti bħala l-lingwa li trid tuża.
  • Jekk trid, oħloq kont. Dan iħallik tippersonalizza l-esperjenza u tneħħi kwalunkwe data li tkun tajtna.
  • Ibda rrekordja l-vuċi tiegħek billi taqra ftit sentenzi kuljum.
  • Isma' r-rekordings ta' ħaddieħor u għidilna humiex tajbin jew le.

Ara vidjow dwar kif taħdem is-sistema.

We need your help

In order to build better voice-activated systems, we need huge amounts of spoken data in Maltese. We are therefore using Common Voice, a project by Mozilla designed to help in the construction of voice-activated systems for all the languages of the world. We are using this in order to collect samples of speech from speakers all over the Maltese islands.

  • Click the button to visit the Common Voice page.
  • Make sure you choose Maltese (Malti) as your language.
  • If you want, you can create an account. This allows you to personalise the experience, but also to delete your data should you wish to.
  • Start recording your voice by reading a few sentences a day.
  • Listen to recordings made by other people and tell us if they're good or not.

Watch this video to see how it works (in Maltese).

Resources

Data and resources created in the MASRI project.

Riżorsi

Data u riżorsi maħluqa fil-proġett MASRI.

MASRI-HEADSET

Corpus of 8 hours of paired Maltese speech and text in Microsoft WAV format, 16khz@16bit mono.
To obtain the dataset, kindly contact us.

Korpus ta' 8 sigħat ta' taħdit bil-Malti, flimkien mat-traskrizzjonijiet, disponibbli fil-format Microsoft WAV, 16khz@16bit mono.
Biex takkwista l-korpus, ikkuntattjana.

MASRI-HEADSET splits
Taqsimiet tal-MASRI-HEADSET

These files reproduce the train/test experiments on MASRI-HEADSET, as described in our recent paper MASRI-HEADSET: A Maltese Corpus for Speech Recognition.
Dawn il-fajls jirriproduċu t-taqsimiet fil-MASRI-HEADSET użati fl-esperimenti rrappurtati fl-artiklu MASRI-HEADSET: A Maltese Corpus for Speech Recognition.

G2P Tool for Maltese
Sistema G2P għall-Malti

A new Python 3 implementaiton of a grapheme-to-phoneme tool (G2P) originally developed for a Maltese text-to-speech system. Code is available here.
Implimentazzjoni bil-Python 3 ta' programm għat-traskrizzjoni fonetika, oriġinarjament żviluppat fi ħdan proġett għal text to speech bil-Malti. Kodiċi disponibbli hawn.

Publications

Project research papers

Pubblikazzjonijiet

Artikli dwar ir-riċerka tal-proġett.

Mena, C; Gatt, A; DeMarco, A; Borg, C; van der Plas, L; Muscat, A and Padovani, I. (2020). MASRI-HEADSET: A Maltese corpus for speech recognition. Proceedings of the 12th edition of the Language Resources and Evaluation Conference (LREC'20). Marseille, France: ELRA.

People

Project members and associated research assistants.

Membri

Membri tal-proġett u assistenti tar-riċerka.

Albert Gatt

Andrea DeMarco

Claudia Borg

Lonneke van der Plas

Carlos Mena

Alexandra Vella

Amanda Muscat

Ian Padovani

Kirsty Azzopardi

Ayrton Didier Brincat

Contact

Ikkuntattjana

Institute of Linguistics and Language Technology
University of Malta
Msida MSD2080

masri {at} um.edu.mt

Loading
Message sent. Thank you! | Il-messaġġ intbagħat. Grazz!