Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/135374| Title: | Exploring fine‐tuning and dialect transfer in speech‐to‐text translation for Swiss German |
| Authors: | Bär, Martin (2024) |
| Keywords: | German language -- Dialects -- Switzerland Switzerland -- Languages Diglossia (Linguistics) Computational linguistics Natural language processing (Computer science) |
| Issue Date: | 2024 |
| Citation: | Bär, M. (2024). Exploring fine‐tuning and dialect transfer in speech‐to‐text translation for Swiss German (Master’s dissertation). |
| Abstract: | Swiss German is a dialect continuum of Alemannic dialects of German spoken by over 5 million people. As in most diglossic communities, dialects vary not only compared to the standard variety but also among each other. The myriad lexical particularities and variety on both phonological and grammatical levels make developing NLP models for Swiss German and other languages with rich dialectal variety challenging. The present work aims to contribute to developing speech‐to‐text translation systems for Swiss German and investigate the impact of multidialectal variety on model performance. We explore different fine‐tuning techniques and data setups for the multilingually pre‐trained XLS‐R model. We find that using Swiss German data from varying domains increases the model’s robustness and use this as a baseline. Moreover, we observe that the average performance decreases by 1.48 BLEU when applying ASR pre‐training. When mixing Swiss and Standard German data, performance drops by 2.29 BLEU. Additionally, our findings indicate that further pre‐training does not improve results, which we attribute to poor data quality. Furthermore, there is a tendency for single‐dialect performance to decline when training on multiple dialects. However, introducing small amounts of dialectal variety can enhance performance for low‐resource dialects. |
| Description: | M.Sc. (HLST)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/135374 |
| Appears in Collections: | Dissertations - FacICT - 2024 Dissertations - FacICTAI - 2024 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2518ICTCSA531005079267_1.PDF Restricted Access | 7.75 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
