Noise robust speech recognition techniques

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/77833

Title:	Noise robust speech recognition techniques
Authors:	Gauci, Oliver (2008)
Keywords:	Automatic speech recognition Noise Support vector machines Markov processes
Issue Date:	2008
Citation:	Gauci, O. (2008). Noise robust speech recognition techniques (Master's dissertation).
Abstract:	Speech recognition systems in controlled environments have reached a very high level of accuracy. However, when these systems are applied to noisy environments, such as in wireless devices, the performance of most state-of-the-art speech recognizers significantly degrades. The portability of many devices in common use, sets very high technical challenges to the speech recognition technology. In some applications, the noise level may vary between + 30dB to -1 OdB and the background noise types can range from stationary to highly non-stationary. There is also no guarantee that two consecutive utterances would be spoken in the same or even similar noise conditions. In speech recognition systems this variability results in a mismatch between the training and operating environments, which degrades the accuracy of speech recognisers. In this dissertation, algorithms to improve the accuracy of speech recognition systems in noisy environments will be implemented. The algorithms which can be classified in four categories: inherently robust feature parameters, speech enhancement, model-based techniques and missing feature approaches were tested using the benchmark AURORA 2 database. Moreover, four novel algorithms were proposed to increase the robustness of existing speech recognition systems at different stages of the recognition process. In the first approach a centered binary tree of SVMs (c-BTS) was presented to increase the robustness at the classification stage. To enhance the accuracy, a binary tree was build using a robust posterior probability measure based on the sigmoid function. In the second algorithm, a non-linear feature extraction method which extracts Amplitude Modulation - Frequency Modulation (AM-FM) features was presented. When the Gammachirp filter was used to isolate a single speech resonance and the Dyn operator to extract the features a consistent improvement over the standard Mel-Frequency cepstral coefficients (MFCC) has been observed. The third algorithm we proposed is based on a subspace approach for speech quality improvement. The eigen decomposition which was originally performed in the input space is now being done in a reproducing kernel Hilbert space, where the speech nonlinearities can be considered. Although originally proposed for speech enhancement, a significant improvement has been noticed when used to enhance the accuracy of the speech recognition system. finally, in the last paper, a Voice Activity Detection algorithm was proposed. The Teager Energy Cepstral Coefficients were used as a feature extraction method and the Gaussian Mixture Models for the classification of speech and silence periods. When compared to a state-of-the-art V AD algorithm the proposed solution achieves better accuracy and significantly reduces clipping of speech periods; thus achieving superior signal quality.
Description:	M.PHIL.
URI:	https://www.um.edu.mt/library/oar/handle/123456789/77833
Appears in Collections:	Dissertations - FacICT - 1999-2009

Files in This Item:

File	Description	Size	Format
M.PHIL._Gauci_Oliver_2008.pdf Restricted Access		18.46 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics