Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/59678
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2020-08-21T06:22:03Z-
dc.date.available2020-08-21T06:22:03Z-
dc.date.issued2019-
dc.identifier.citationCardenas Acosta, R. A. (2019). Morphological process transduction : towards interpretable multi-lingual morphological analysis (Masters dissertation)en_GB
dc.identifier.urihttps://www.um.edu.mt/library/oar/handle/123456789/59678-
dc.descriptionM.SC.HUMAN LANG.SC.&TECH.en_GB
dc.description.abstractThe persistent efforts to make valuable annotated corpora in more diverse, morphologically rich languages has driven research in NLP into considering more explicit techniques to incorporate morphological information into the pipeline. Recent efforts have proposed combined strategies to bring together the transducer paradigm and neural architectures, although ingesting one character at a time in context-agnostic setup. In this thesis, we introduce a technique inspired by the byte-pair-encoding (BPE) compression algorithm in order to obtain transducing actions that resemble word formations more faithfully. Then, we propose a neural transducer architecture that operates over these transducing actions, ingesting one word token at a time and effectively incorporating sentence-level context by encoding per-token action representations in a hierarchical fashion. We investigate the benefit of this word formation representations for the tasks of lemmatization and context-aware morphological tagging for a typologically diverse set of languages. For lemmatization, we use investigate an optimization technique that explores possible action sequences and scores them based on task-specific metrics instead of standard log-likelihood. We find that our approach benefits greatly languages that use less commonly studied morphological processes such as templatic processes, with up to 55.73% error reduction in lemmatization for Arabic. Furthermore, we find that projecting these word formation representations into a common multilingual space enables our models to group together action labels signalling the same phenomena in several languages, e.g. Plurality, irrespective of the language-specific morphological process that may be involved. For morphological tagging, we investigate the effect of different tagging strategies, e.g. bundle vs individual tag prediction, as well as the effect of multilingual action representations. We find that our taggers are able to obtain up to 20% error reduction by leveraging multilingual actions with respect to the monolingual scenario.en_GB
dc.language.isoenen_GB
dc.rightsinfo:eu-repo/semantics/restrictedAccessen_GB
dc.subjectNatural language processing (Computer science)en_GB
dc.subjectArtificial intelligenceen_GB
dc.subjectData miningen_GB
dc.titleMorphological process transduction : towards interpretable multi-lingual morphological analysisen_GB
dc.typemasterThesisen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.en_GB
dc.publisher.institutionUniversity of Maltaen_GB
dc.publisher.departmentFaculty of Information and Communication Technology. Department of Artificial Intelligenceen_GB
dc.description.reviewedN/Aen_GB
dc.contributor.creatorCardenas Acosta, Ronald Ahmed-
Appears in Collections:Dissertations - FacICT - 2019
Dissertations - FacICTAI - 2019
Dissertations - InsLin - 2019

Files in This Item:
File Description SizeFormat 
Ronald_Cardenas_Acosta.pdf
  Restricted Access
27.5 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.