Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/79175
Title: Phrase extraction for machine translation
Authors: Bajada, Jo-Ann (2010)
Keywords: Machine translating
Natural language processing (Computer science)
Algorithms
Issue Date: 2010
Citation: Bajada, J. A. (2010). Phrase extraction for machine translation (Master’s dissertation).
Abstract: Statistical Machine Translation (SMT) developed in the late 1980s, based initially upon a word-to-word translation process. However, such processes have difficulties when good quality translation is not strictly word-to-word. Easy cases can be handled by allowing insertion and deletion of single words, but for more general word reordering phenomena, a more general translation process is required. There is currently much interest in phrase-to- phrase models, which can overcome this problem, but these require that candidate phrases, together with their translations, be identified in the training corpora. Since phrase delimiters are not explicit, this gives rise to a new problem: that of phrase pair extraction. The current project proposes a phrase extraction algorithm which uses a fixed-length window of words around a source and target word pair to extract equivalent phrases. The extracted phrases together with their probabilities are used as input to an existing Machine Translation (MT) system for the purpose of evaluating the phrase extraction algorithm. The remainder of this document goes on to describe in detail the steps taken in building the phrase alignment system.
Description: M.SC.COMP.SCI.&ARTIFICIAL INTELLIGENCE
URI: https://www.um.edu.mt/library/oar/handle/123456789/79175
Appears in Collections:Dissertations - FacICT - 2010
Dissertations - FacICTAI - 2002-2014

Files in This Item:
File Description SizeFormat 
M.SC._Bajada_Jo-Ann_2010.pdf
  Restricted Access
13.21 MBAdobe PDFView/Open Request a copy
Bajada_Jo-Ann_acc.material.pdf
  Restricted Access
64.46 kBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.