Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/92179| Title: | Deep learning techniques for classifying sentiment in social media postings |
| Authors: | Grech, Nathan (2021) |
| Keywords: | Social media Sentiment analysis Deep learning (Machine learning) |
| Issue Date: | 2021 |
| Citation: | Grech, N. (2021). Deep learning techniques for classifying sentiment in social media postings (Bachelor's dissertation). |
| Abstract: | Nowadays, social media users are generating large volumes of sentiment‐rich data in various forms such as tweets and comments. Such data can cover a vast range of topics, including product reviews, politics, stock markets, and investor opinions. The large number of potential use‐ cases arising from analysing the sentiment of such data has led to enormous interest in this domain, with research constantly advancing. Using the SemEval‐2017 Task 4A English competition as a medium, this research explores the application of deep learning techniques for classifying Twitter postings into three polarity classes: "positive", "neutral", and "negative". The first part of this work investigates the LSTM‐based model and technical choices made by DataStories, a team that came joint first in the aforementioned SemEval competition, with the principal aim of optimising their model to improve its out‐of‐sample classification performance. Secondly, this research also trains, optimises, and evaluates the novel Transformer‐based BERT and ROBERTA models, with the goal of discovering whether the previously‐observed superiority of such methods over traditional deep learning methods in various natural language processing tasks is also reflected in the task of Twitter sentiment analysis. Since this project entails testing many different model configurations, the constructed systems are implemented using a grid‐search framework to facilitate mass experimentation. The experiments conducted include investigating different validation set evaluation metrics and epoch selection strategies, exploring seed sensitivity, researching the effects of different train‐validation split ratios, and hyperparameter optimisation. All of the model configurations developed in this work outperform the original DataStoriesmodel configuration on the SemEval‐2017 Task 4A English test set. Through a combination of different random seeds and hyperparameter values, a more optimised configuration of the DataStories model was developed. Furthermore, the BERT and ROBERTA models significantly outperform the LSTM‐based DataStories model configurations, further confirming these models' superiority on natural language processing tasks. |
| Description: | B.Sc. IT (Hons)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/92179 |
| Appears in Collections: | Dissertations - FacICT - 2021 Dissertations - FacICTCIS - 2021 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 21BITSD017.pdf Restricted Access | 8.32 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
