Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/115279| Title: | Grammatical analysis of Maltese text |
| Authors: | Busuttil, Alana (2023) |
| Keywords: | Maltese language -- Data processing Natural language processing (Computer science) Machine learning |
| Issue Date: | 2023 |
| Citation: | Busuttil, A. (2023). Grammatical analysis of Maltese text (Bachelor's dissertation). |
| Abstract: | This work looks at grammatical inference of Maltese words, with a particular focus on the morphological analysis of Maltese words. Maltese is a hybrid language, with Semitic influence mostly visible in its grammar rules (following a root-and-pattern conjucation system) and a strong lexical influence from Italian and English. Given these influences, this project looks at grammatical inference from both a mono-lingual perspective as well as a multilingual perspective. Maltese is also a low-resource language, with minimal human-annotated data in grammatical inference. A neural approach is used to train a series of models that are able to take a Maltese word as input and produce its grammatical analysis as output. The data-set for this model was adapted from UniMorph, a universal morphological annotation project. The research focuses on creating and comparing a number of computational models. The baseline approach builds a Recurrent Neural Network (RNN) from scratch and uses the training data to build a model that can learn the grammatical mappings of Maltese words. The remaining models are pre-trained models fine-tuned for the task of grammatical analysis. The first two pre-trained models are BERT and mBERT, a monolingual English model and a multilingual model (104 languages) respectively. Both of these models do not include Maltese in the pre-training. The following models are BERTu, a Maltese monolingual model, and mBERTu, which is a further pretrained version of mBERT on Maltese data. Apart from using different language models, the project also experimented with cross-lingual transfer and zero-shot learning. These help in understanding to what extent grammatical information can be generalised to little or unseen, but related languages. The final results compare the various models and language settings to analyse which of these can perform best for Maltese grammatical inference. It was ultimately concluded that Maltese pre-trained models outperformed their BERT counterparts. The highest performing Maltese model on the task of morphological analysis was the mBERTu model. This project is the first to use neural networks for grammatical inference in the Maltese language, while previous approaches relied on statistical classification methods. This project establishes a baseline for future research on Maltese grammar analysis. |
| Description: | B.Sc. IT (Hons)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/115279 |
| Appears in Collections: | Dissertations - FacICT - 2023 Dissertations - FacICTAI - 2023 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2308ICTICT390900015463_1.PDF Restricted Access | 1.56 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
