Automated identification of plots and story structure in unstructured documents

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/95231

Title:	Automated identification of plots and story structure in unstructured documents
Authors:	Tanti, Alex (2011)
Keywords:	Information technology Computer simulation
Issue Date:	2011
Citation:	Tanti, A. (2011). Automated identification of plots and story structure in unstructured documents (Bachelor's dissertation).
Abstract:	Modern information retrieval systems such as search engines are adopting the philosophy of "The less information, the better". This means that if, for example, a user makes a query for the date of birth of the Maltese prime minister, he or she should be given the actual date of birth and not a list of documents about Dr. George Abela. Thus, a whole research area exists which deals with specific information within a document, and this is referred to as information extraction (IE). Text classification (TC) is another area in human language technology (HL T) that is becoming increasingly popular. This area deals with the process of assigning a label to a particular piece of text according to unique features found in the text. The research in this thesis is based on how techniques in the two branches just mentioned can be applied to literature oriented works in the English language. In a nutshell, the system takes a novel written in English as input and returns particular information such as the main characters and their type, the plot type of the story and the character interactions found in the story. To our knowledge, no research was done of this sort and this made it more challenging since all the ideas had to be designed from scratch. The system was evaluated against a human gold standard by taking two distinct scenarios. The first scenario included pronouns that refer to characters (anaphora resolution) as valid character instances in the text, thus increasing the frequency of occurrences of each character in the text. On the other hand, the second scenario excluded pronouns. When considering the lack of research papers found on the subject, the combined F-Measure results of the main tasks are fairly satisfiable; 52% when including pronouns and 57% when excluding pronouns
Description:	B.SC.(HONS)IT
URI:	https://www.um.edu.mt/library/oar/handle/123456789/95231
Appears in Collections:	Dissertations - FacICT - 2011

Files in This Item:

File	Description	Size	Format
BSC(HONS)ICT_Tanti, Alex_2011.PDF Restricted Access		14.34 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics