Trajectory space factorisation for vision-based automated sign language recognition

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/70724

Title:	Trajectory space factorisation for vision-based automated sign language recognition
Authors:	Borg, Mark (2020)
Keywords:	Sign language Human-computer interaction Computer vision Pattern recognition systems
Issue Date:	2020
Citation:	Borg, M. (2020). Trajectory space factorisation for vision-based automated sign language recognition (Doctoral dissertation).
Abstract:	Sign Languages are the main communication methods used by deaf communities around the world. These languages exhibit rich linguistic structure, and primarily convey semantic meaning via the spatial location of the hands, the hand shapes, as well as their motion. Automatic sign language recognition is a very challenging task, with current state-ofthe- art systems achieving high word error rates of approximately 1 in 4 signs. In our work, we use signing videos acquired with off-the-shelf cameras. We investigate whether the structured patterns of motion exhibited by the signer’s hands, which should follow the linguistic rules and constraints of sign languages, can be exploited to both help overcome the challenges faced by the low-level computer vision (perception) processes, as well as whether these structured patterns can be utilised for sign recognition. In our experiments, we make novel use of a structure-from-motion technique to recover the trajectories of the signer’s hands. We then map these trajectories to a DCT-based trajectory space, from which semantically meaningful subunits are extracted. We utilise our novel subunits as features within both a traditional computer vision pipeline, as well as within a deep learning framework, showing their versatility. Compared to traditional sign recognition systems, our approach achieves a higher performance, exceeding the state of the art accuracy by 4%. And our deep learning based approach achieves an improvement in accuracy of 10:2% over existing state-of-the-art deep learning systems, with just 1:3% shy of the best deep learning approach. But we also demonstrate that our approach offers better explainability through the use of our phonological subunits, in contrast to the majority of deep learning systems which either do not employ subunits, or use subunits that lack this property. We further show that another advantage of our approach is its ‘transferability’, allowing us to train our system on a sign language for which a large training corpus is available, and then fine-tune the trained system for another language with a limited training corpus, such as the Maltese Sign Language (LSM).
Description:	PH.D
URI:	https://www.um.edu.mt/library/oar/handle/123456789/70724
Appears in Collections:	Dissertations - FacEng - 2020

Files in This Item:

File	Description	Size	Format
20PHDENG002.pdf		58.52 MB	Adobe PDF	View/Open

Show full item record Statistics