Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/74891
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2021-04-28T09:51:50Z-
dc.date.available2021-04-28T09:51:50Z-
dc.date.issued2019-
dc.identifier.citationMagro, D. (2019). A diphone-based Maltese speech synthesis system (Bachelor's dissertation).en_GB
dc.identifier.urihttps://www.um.edu.mt/library/oar/handle/123456789/74891-
dc.descriptionB.SC.ICT(HONS)ARTIFICIAL INTELLIGENCEen_GB
dc.description.abstractWhile there has been work in the area, at the time of writing there are no available TTS systems for Maltese, thus almost the entire system had to be built from scratch. In light of this, a Diphone-Based Concatenative Speech System was chosen as the type of synthesiser to implement. This was due to the minimal amount of data needed, requiring less than 20 minutes of recorded speech. A simple `Text Normalisation' component was built, which converts integers between 0 and 9,999 written as numerals to their textual form. While this is far from covering all the possible forms of Non-Standard Words (NSWs) in Maltese, the modular nature in which it was built allows for easy upgrading in future work. A `Grapheme to Phoneme (G2P)' component which then converts the normalised text into a sequence of phonemes (basic sounds) that make up the text was also created, based on an already existing implementation by Crimsonwing. Three separate `Diphone Databases' were made available to the speech synthesiser. One of these is the professionally recorded English Diphone database FestVox's `CMU US KAL Diphone'1. The second and third were created as part of this work, one with diphones manually extracted from the recorded carrier phrases in Maltese, the other with diphones automatically extracted using Dynamic Time Warping (DTW). The Time Domain - Pitch Synchronous OverLap Add (TD-PSOLA) concatenation algorithm was implemented to string together the diphones in the sequence specified by the G2P component. On a scale of 1 to 5, the speech synthesised when using the diphone database of manually extracted diphones concatenated by the TD-PSOLA algorithm was scored 2.57 for naturalness, 2.72 for clarity, and most important of all, 3.06 for Intelligibility by evaluators. These scores were higher than those obtained when using the professionally recorded English diphone set.en_GB
dc.language.isoenen_GB
dc.rightsinfo:eu-repo/semantics/restrictedAccessen_GB
dc.subjectVision disorders -- Maltaen_GB
dc.subjectLiteracy -- Maltaen_GB
dc.subjectMaltese language -- Maltaen_GB
dc.subjectSpeech synthesisen_GB
dc.subjectSpeech processing systemsen_GB
dc.titleA diphone-based Maltese speech synthesis systemen_GB
dc.typebachelorThesisen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.en_GB
dc.publisher.institutionUniversity of Maltaen_GB
dc.publisher.departmentFaculty of Information and Communication Technology. Department of Artificial Intelligenceen_GB
dc.description.reviewedN/Aen_GB
dc.contributor.creatorMagro, Daniel (2019)-
Appears in Collections:Dissertations - FacICT - 2019
Dissertations - FacICTAI - 2019

Files in This Item:
File Description SizeFormat 
Magro Daniel.pdf
  Restricted Access
2.45 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.