Please use this identifier to cite or link to this item:
Title: Incorporating an error corpus into a spellchecker for Maltese
Authors: Rosner, Mike
Gatt, Albert
Joachimsen, Jan
Attard, Andrew
Keywords: Natural language processing (Computer science)
Corpora (Linguistics)
Linguistic analysis (Linguistics)
Reference (Linguistics)
Word (Linguistics)
Issue Date: 2012
Publisher: European Language Resources Association (ELRA)
Citation: Rosner, M., Gatt, A., Attard, A., & Joachimsen, J. (2012). Incorporating an error corpus into a spellchecker for Maltese. 8th International Conference on Language Resources and Evaluation (LREC), Istanbul. 743-750.
Abstract: This paper discusses the ongoing development of a new Maltese spell checker, highlighting the methodologies which would best suit such a language. We thus discuss several previous attempts, highlighting what we believe to be their weakest point: a lack of attention to context. Two developments are of particular interest, both of which concern the availability of language resources relevant to spellchecking: (i) the Maltese Language Resource Server (MLRS) which now includes a representative corpus of c. 100M words extracted from diverse documents including the Maltese Legislation, press releases and extracts from Maltese web-pages and (ii) an extensive and detailed corpus of spelling errors that was collected whilst part of the MLRS texts were being prepared. We describe the structure of these resources as well as the experimental approaches focused on context that we are now in a position to adopt. We describe the framework within which a variety of different approaches to spellchecking and evaluation will be carried out, and briefly discuss the first baseline system we have implemented. We conclude the paper with a roadmap for future improvements.
Appears in Collections:Scholarly Works - InsLin

Files in This Item:
File Description SizeFormat 
Incorporating_an_Error_Corpus_into_a_Spe.pdf603.08 kBAdobe PDFView/Open

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.