Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/141884
Title: Datasets and models for authorship attribution on Italian personal writings
Authors: Ruggiero, Gaetana (2021)
Keywords: Authorship
Italian language -- Malta
Data sets
Italian language -- Written Italian
Issue Date: 2021
Citation: Ruggiero, G. (2021). Datasets and models for authorship attribution on Italian personal writings (Master's dissertation).
Abstract: Authorship Attribution (AA) is the study of identifying authors by their writing style. Over the past few years, determining the authors of online content has played a crucial role in many fields, such as online security, plagiarism detection and fake news identification. While extensive research has been done in this field for English, little investigation has focused on Italian, with the only outstanding case being the study on Elena Ferrante’s true identity. Existing research on AA focuses on texts for which a lot of data is available (i.e novels, articles), and which are not necessarily influenced by an author’s personal writing style due to editorial interventions. This study approaches the AA task in terms of Authorship Verification (AV), a binary classification task where, given two texts, the goal is to decide whether or not they are written by the same author. Following H ̈urlimann et al. (2015) and inspired by the work on blogger identification of Mohtasseb et al. (2009), we run the GLAD AV system on Italian forum comments and personal diaries. We introduce two novel datasets suitable for the AV task, which can be easily adapted to work with other AA tasks. We show the complexity of the data, and analyze the interaction between four different variables, i.e. genre, topic, authors’ gender and number of words taken into account per author. We perform intra-topic, cross-topic and cross-genre experiments and discuss the results obtained for each setting. We show that AV is feasible even with little data, but more evidence helps. Gender and topic can be indicative clues, and if not controlled for, they might overtake more specific aspects of personal style. We also show that, contrarily to what other studies have proved (Sapkota et al., 2014; Stamatatos et al., 2015), cross-topic and cross-genre experiments are comparable to intra-topic ones.
Description: M.Sc. (HLST)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/141884
Appears in Collections:Dissertations - FacICT - 2021
Dissertations - FacICTAI - 2021

Files in This Item:
File Description SizeFormat 
2118ICTCSA531005064906_1.PDF
  Restricted Access
1.59 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.