Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/74891
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2021-04-28T09:51:50Z | - |
dc.date.available | 2021-04-28T09:51:50Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Magro, D. (2019). A diphone-based Maltese speech synthesis system (Bachelor's dissertation). | en_GB |
dc.identifier.uri | https://www.um.edu.mt/library/oar/handle/123456789/74891 | - |
dc.description | B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE | en_GB |
dc.description.abstract | While there has been work in the area, at the time of writing there are no available TTS systems for Maltese, thus almost the entire system had to be built from scratch. In light of this, a Diphone-Based Concatenative Speech System was chosen as the type of synthesiser to implement. This was due to the minimal amount of data needed, requiring less than 20 minutes of recorded speech. A simple `Text Normalisation' component was built, which converts integers between 0 and 9,999 written as numerals to their textual form. While this is far from covering all the possible forms of Non-Standard Words (NSWs) in Maltese, the modular nature in which it was built allows for easy upgrading in future work. A `Grapheme to Phoneme (G2P)' component which then converts the normalised text into a sequence of phonemes (basic sounds) that make up the text was also created, based on an already existing implementation by Crimsonwing. Three separate `Diphone Databases' were made available to the speech synthesiser. One of these is the professionally recorded English Diphone database FestVox's `CMU US KAL Diphone'1. The second and third were created as part of this work, one with diphones manually extracted from the recorded carrier phrases in Maltese, the other with diphones automatically extracted using Dynamic Time Warping (DTW). The Time Domain - Pitch Synchronous OverLap Add (TD-PSOLA) concatenation algorithm was implemented to string together the diphones in the sequence specified by the G2P component. On a scale of 1 to 5, the speech synthesised when using the diphone database of manually extracted diphones concatenated by the TD-PSOLA algorithm was scored 2.57 for naturalness, 2.72 for clarity, and most important of all, 3.06 for Intelligibility by evaluators. These scores were higher than those obtained when using the professionally recorded English diphone set. | en_GB |
dc.language.iso | en | en_GB |
dc.rights | info:eu-repo/semantics/restrictedAccess | en_GB |
dc.subject | Vision disorders -- Malta | en_GB |
dc.subject | Literacy -- Malta | en_GB |
dc.subject | Maltese language -- Malta | en_GB |
dc.subject | Speech synthesis | en_GB |
dc.subject | Speech processing systems | en_GB |
dc.title | A diphone-based Maltese speech synthesis system | en_GB |
dc.type | bachelorThesis | en_GB |
dc.rights.holder | The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder. | en_GB |
dc.publisher.institution | University of Malta | en_GB |
dc.publisher.department | Faculty of Information and Communication Technology. Department of Artificial Intelligence | en_GB |
dc.description.reviewed | N/A | en_GB |
dc.contributor.creator | Magro, Daniel (2019) | - |
Appears in Collections: | Dissertations - FacICT - 2019 Dissertations - FacICTAI - 2019 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Magro Daniel.pdf Restricted Access | 2.45 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.