Advancing automated Maltese spell checking using deep learning

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/146924

Title:	Advancing automated Maltese spell checking using deep learning
Authors:	Fava, Owen (2026)
Keywords:	Maltese language -- Orthography and spelling Natural language processing (Computer science) Natural language generation (Computer science) Corpora (Linguistics) Computer algorithms -- Evaluation
Issue Date:	2026
Citation:	Fava, O. (2026). Advancing automated Maltese spell checking using deep learning (Master’s dissertation).
Abstract:	Research in Natural Language Processing (NLP) for Maltese has advanced in recent years, yet reliable and contextually aware spell checking support remains limited. Existing tools, such as spelling.mt, provide dictionary-based correction but cannot handle grammatical or context-sensitive errors, leaving a clear gap com pared with high-resource languages. This dissertation investigates whether a Large Language Model (LLM) can be fine-tuned to develop an effective and resource efficient spell checker for Maltese. The study adapts Meta’s Large Language Model MetaAI (LLaMA)-3-8B-Instruct model using pairs of incorrect and corrected Maltese words and sentences. A cus tom fine-tuning corpus was created by introducing linguistically motivated syn thetic errors into correctly written text and incorporating smaller sets of authen tic incorrect-correct pairs from prior research. Two complementary models were trained: a word-level model that corrects individual tokens and a sentence-level modelthatcorrectsgrammaticalandorthographicerrorswithinfullsentences. Their performance wasevaluated across synthetic, real-world, and correct-input datasets. The main contribution of this dissertation is the development of the first LLM based spell checking models, specifically fine-tuned for Maltese, supported by a curated error-correction dataset and a systematic multi-setting evaluation. To our knowledge, this study is the first to explore the use of a modern generative LLM to build a Maltese spell checker and assess its potential in a low-resource setting. The developed models demonstrated strong performance on incorrect-correct pairs with synthetically generated errors, with the word-level model achieving 84.64% accuracy and the sentence-level model reaching 95.3%. For real-world eval uation, only the sentence-level model was tested because no suitable word-level datasets are available. In this context, the accuracy of the sentence-level model dropped to 29.6%, reflecting the increased variability of naturally occurring mis takes. However, metrics such as Error Annotation Toolkit (ERRANT), Bilingual Evaluation Understudy (BLEU), and Grammar Language Evaluation Understudy (GLEU) returned promising values, indicating that the model often produced cor rections close to the target. Overall, the findings demonstrate that fine-tuning an LLM offers a promising pathway towards effective Maltese spell checking, while revealing key challenges related to generalisation and data scarcity. The study provides an initial bench mark andoutlines concrete directions for developing more robust, deployable spell checking solutions for low-resource languages.
Description:	M.Sc.(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/146924
Appears in Collections:	Dissertations - FacICT - 2026 Dissertations - FacICTAI - 2026

Files in This Item:

File	Description	Size	Format
2619ICTICS520005085879_1.PDF		3.06 MB	Adobe PDF	View/Open

Show full item record Statistics