Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/47892
Title: | Machine learning techniques for e-mail categorization |
Authors: | Stellini, Steve |
Keywords: | Machine learning Electronic mail systems |
Issue Date: | 2019 |
Citation: | Stellini, S. (2019). Machine learning techniques for e-mail categorization (Bachelor's dissertation). |
Abstract: | It is estimated that, worldwide, around 246.5 billion e-mails are sent every day [1]. E-mail communication is used by billions of people every day and is a mission-critical application for many businesses [1]. E-mail users often feel that they are ‘drowning’ in e-mails and start to lose track of what types of e-mails they have in their inbox. The principal aims of this study is to investigate the effect that text pre-processing, input encodings, and feature selection techniques have on the performance of various machine learning algorithms for automatic categorization of e-mails into given classes. As a reference point we used the work done by Padhye [2] who, in her Masters dissertation, compared the performance of 3 supervised machine learning algorithms, Support Vector Machine, Naive Bayes, and J48, for the purpose of e-mail categorization. Padhye used the Enron e-mail dataset for this purpose. The data was manually labelled by Padhye herself. Padhye used the WEKA libraries to implement the 3 algorithms. Using Padhye’s results as a baseline we experimented with different encoding schemes and feature selection techniques. Significant improvements were achieved on the results obtained in [2]. We also propose a novel classification algorithm which makes use of pre-built class models to model different classes available in the dataset. During the classification an unseen e-mail is compared to each class model, giving different scores to each model according to the similarity between the e-mail and the model. The class model which obtains the highest score is considered to be the category that the particular e-mail should be classified in. |
Description: | B.SC.SOFTWARE DEVELOPMENT |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/47892 |
Appears in Collections: | Dissertations - FacICT - 2019 Dissertations - FacICTCIS - 2019 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
19BITSD016.pdf Restricted Access | 2.22 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.