A diphone-based Maltese speech synthesis system

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/74891

Full metadata record

DC Field	Value	Language
dc.date.accessioned	2021-04-28T09:51:50Z	-
dc.date.available	2021-04-28T09:51:50Z	-
dc.date.issued	2019	-
dc.identifier.citation	Magro, D. (2019). A diphone-based Maltese speech synthesis system (Bachelor's dissertation).	en_GB
dc.identifier.uri	https://www.um.edu.mt/library/oar/handle/123456789/74891	-
dc.description	B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE	en_GB
dc.description.abstract	While there has been work in the area, at the time of writing there are no available TTS systems for Maltese, thus almost the entire system had to be built from scratch. In light of this, a Diphone-Based Concatenative Speech System was chosen as the type of synthesiser to implement. This was due to the minimal amount of data needed, requiring less than 20 minutes of recorded speech. A simple `Text Normalisation' component was built, which converts integers between 0 and 9,999 written as numerals to their textual form. While this is far from covering all the possible forms of Non-Standard Words (NSWs) in Maltese, the modular nature in which it was built allows for easy upgrading in future work. A `Grapheme to Phoneme (G2P)' component which then converts the normalised text into a sequence of phonemes (basic sounds) that make up the text was also created, based on an already existing implementation by Crimsonwing. Three separate `Diphone Databases' were made available to the speech synthesiser. One of these is the professionally recorded English Diphone database FestVox's `CMU US KAL Diphone'1. The second and third were created as part of this work, one with diphones manually extracted from the recorded carrier phrases in Maltese, the other with diphones automatically extracted using Dynamic Time Warping (DTW). The Time Domain - Pitch Synchronous OverLap Add (TD-PSOLA) concatenation algorithm was implemented to string together the diphones in the sequence specified by the G2P component. On a scale of 1 to 5, the speech synthesised when using the diphone database of manually extracted diphones concatenated by the TD-PSOLA algorithm was scored 2.57 for naturalness, 2.72 for clarity, and most important of all, 3.06 for Intelligibility by evaluators. These scores were higher than those obtained when using the professionally recorded English diphone set.	en_GB
dc.language.iso	en	en_GB
dc.rights	info:eu-repo/semantics/restrictedAccess	en_GB
dc.subject	Vision disorders -- Malta	en_GB
dc.subject	Literacy -- Malta	en_GB
dc.subject	Maltese language -- Malta	en_GB
dc.subject	Speech synthesis	en_GB
dc.subject	Speech processing systems	en_GB
dc.title	A diphone-based Maltese speech synthesis system	en_GB
dc.type	bachelorThesis	en_GB
dc.rights.holder	The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.	en_GB
dc.publisher.institution	University of Malta	en_GB
dc.publisher.department	Faculty of Information and Communication Technology. Department of Artificial Intelligence	en_GB
dc.description.reviewed	N/A	en_GB
dc.contributor.creator	Magro, Daniel (2019)	-
Appears in Collections:	Dissertations - FacICT - 2019 Dissertations - FacICTAI - 2019

Files in This Item:

File	Description	Size	Format
Magro Daniel.pdf Restricted Access		2.45 MB	Adobe PDF	View/Open Request a copy

Show simple item record Statistics