Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/47892
Title: Machine learning techniques for e-mail categorization
Authors: Stellini, Steve
Keywords: Machine learning
Electronic mail systems
Issue Date: 2019
Citation: Stellini, S. (2019). Machine learning techniques for e-mail categorization (Bachelor's dissertation).
Abstract: It is estimated that, worldwide, around 246.5 billion e-mails are sent every day [1]. E-mail communication is used by billions of people every day and is a mission-critical application for many businesses [1]. E-mail users often feel that they are ‘drowning’ in e-mails and start to lose track of what types of e-mails they have in their inbox. The principal aims of this study is to investigate the effect that text pre-processing, input encodings, and feature selection techniques have on the performance of various machine learning algorithms for automatic categorization of e-mails into given classes. As a reference point we used the work done by Padhye [2] who, in her Masters dissertation, compared the performance of 3 supervised machine learning algorithms, Support Vector Machine, Naive Bayes, and J48, for the purpose of e-mail categorization. Padhye used the Enron e-mail dataset for this purpose. The data was manually labelled by Padhye herself. Padhye used the WEKA libraries to implement the 3 algorithms. Using Padhye’s results as a baseline we experimented with different encoding schemes and feature selection techniques. Significant improvements were achieved on the results obtained in [2]. We also propose a novel classification algorithm which makes use of pre-built class models to model different classes available in the dataset. During the classification an unseen e-mail is compared to each class model, giving different scores to each model according to the similarity between the e-mail and the model. The class model which obtains the highest score is considered to be the category that the particular e-mail should be classified in.
Description: B.SC.SOFTWARE DEVELOPMENT
URI: https://www.um.edu.mt/library/oar/handle/123456789/47892
Appears in Collections:Dissertations - FacICT - 2019
Dissertations - FacICTCIS - 2019

Files in This Item:
File Description SizeFormat 
19BITSD016.pdf
  Restricted Access
2.22 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.