Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/10988| Title: | Sentiment analysis in Maltese |
| Authors: | Sant, Nicole Jessica |
| Keywords: | Natural language processing (Computer science) Computational linguistics Public opinion -- Data processing Data mining |
| Issue Date: | 2015 |
| Abstract: | In today's modern age, the use of the web and social media sites for communication is on the rise, with more and more people each day making use of such facilities to express their opinion about one thing or another. This phenomenon has spread rapidly throughout the Maltese islands over the past few years, particularly on social media websites such as Facebook or online gazettes, where hundreds of reviews are posted on a daily basis by citizens wishing to make their voice heard about various subjects. Sentiment analysis refers to the task of analysing such reviews and classifying them as positive, negative or neutral, according to the overall sentiment of the opinion being expressed. In this FYP, we present a novel system capable of performing such a task for text written in Maltese. We propose a supervised machine learning, context based approach by which we aim to not only determine the optimal algorithm and parameters for achieving the best results possible with our system, but also to surpass a baseline accuracy of 34% obtained by a random classi er and reach that of 64% obtained through manually designed rules. Our system consists of two main components both capable of performing preprocessing, feature extraction and classi cation of text written in Maltese at a context window level, yet while one follows the more traditional machine learning approach where features are manually hand-crafted and passed on to classi cation algorithms, the other performs unsupervised feature extraction and makes use of deep learning classi ers to categorize the text. Through experimentation we determined that a Random Forest classi er in conjunction with 80% of our dataset for training and a 4 word context window was the optimal scenario to achieve the best results, and were successful in not only surpassing the baseline accuracy but also achieving a 62.3% accuracy through the use of the aforementioned classi er and parameters. |
| Description: | B.SC.IT(HONS) |
| URI: | https://www.um.edu.mt/library/oar//handle/123456789/10988 |
| Appears in Collections: | Dissertations - FacICT - 2015 Dissertations - FacICTAI - 2015 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 15BSCIT036.pdf Restricted Access | 2.58 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
