Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/14744
Title: From tweets to narratives : using tweets to generate news reports
Authors: Distefano, Marco
Keywords: Social media
Natural language processing (Computer science)
Public opinion -- Data processing
Issue Date: 2016
Abstract: Over the last decade or so, social media websites have begun to play an important role in the dissemination of all types of news and information. One of the main attractions of using social media is the ability to generate content, access information and potentially reach large audiences. With user-generated content constantly being created and distributed across social media platforms, the opportunities to harvest and analyse such data and information have opened up new possibilities within a number of different fields of research. Twitter, for instance, boasts three hundred and ten million monthly active users. One could imagine that an intelligent system may be able to read through a set of tweets and return the information and messages conveyed within the tweets in the form of a news report. Due to a fascination with the field of natural language generation and a strong interest in social media and sentiment analysis, the idea of generating a text which is able to summarise the facts and opinions found in tweets seemed to offer a better alternative to reading through a set of tweets individually. This dissertation focuses on the design and implementation of an NLG system which is able to identify a set of tweets using a particular query, collect such tweets, and generate an output text. The text produced by the system takes the form of a news report narrative where the most frequently mentioned facts within the tweets are incorporated within the report generated. As part of the dissertation, a study was undertaken to investigate a possible way of grouping Twitter topics according to their hashtags, to attempt to collect similar tweets with different hashtags. The system was implemented using Python and Java, with each tweet being dependency parsed to extract and define relations between words and phrases. Two different approaches were used to convert the dependency parses of the individual tweets into meaningful content to be used within the texts. Sentiment analysis was also added to the project, with user sentiment across tweets reported within the generated news report. The system returned some interesting output texts which were used within a human-based evaluation to judge the fluency and content of the news reports. Although the texts generated were not of a very high standard, the evaluation concluded that they were of a satisfactory level, with the best text produced according to the evaluation achieving average scores of 5.71 and 5.64 (out of 10) for its fluency and content respectively. The results obtained through the two different approaches used to determine the content to generate are also analysed in detail, with one approach achieving better overall results and potentially highlighting a way forward for such a system to be used and developed further.
Description: M.SC.LANG.&COMPUTATION
URI: https://www.um.edu.mt/library/oar//handle/123456789/14744
Appears in Collections:Dissertations - InsLin - 2016

Files in This Item:
File Description SizeFormat 
16MSCLC001.pdf
  Restricted Access
2.15 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.