Please use this identifier to cite or link to this item:
Title: Automatic analysis of handwritten text
Authors: Bugeja, Mark
Keywords: ASCII (Character set)
Neural networks (Computer science)
Issue Date: 2018
Citation: Bugeja, M. (2018). Automatic analysis of handwritten text (Master's dissertation).
Abstract: Analysing Handwritten Documents is a challenging task. This particular area cannot always come up with general solutions, given that most handwritten data sets contain unique characteristics that describe how the document was written which include different handwritings. This is mostly attributed to multiple scribes contributing to the transcription of the text and degradation of the script. In this study, a unique dataset is presented which up to now has never been read or analysed. The aim is to be able to come up with an adaptive system which is able to tackle the two different challenges. These challenges are to apply document text segmentation and conversion to machine readable text to the unique dataset used in the study. This study goes through the process of converting a document image into a set of segmented components describing the lowest level of denomination needed to transform the document into ASCII characters. The novel approach used in this dissertation is able to convert the document image without any prior knowledge of the text. In fact, the training set used in this study is a synthetic dataset built on the Google Fonts database. This approach segments the document into lines, words and finally characters using a number of unique approaches that were adapted from the literature. Notably the line segmentation and the character segmentation yielded positive results with the line segmentation achieving an overall segmentation accuracy of 92.81%. The final text recognition process is built on machine learning models and Deep Neural Nets using a Multilayer Perceptron architecture. A unique training set was created to try and classify handwritten text without the use of a subset of manual labelled characters from the final testing dataset.
Appears in Collections:Dissertations - FacICT - 2018
Dissertations - FacICTAI - 2018

Files in This Item:
File Description SizeFormat 
  Restricted Access
3.51 MBAdobe PDFView/Open Request a copy

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.