Please use this identifier to cite or link to this item:
Title: Spam detection using machine learning techniques
Authors: Sciberras, Kyle (2015)
Keywords: Spam (Electronic mail)
Spam filtering (Electronic mail)
Machine learning
Issue Date: 2015
Citation: Sciberras, K. (2015). Spam detection using machine learning techniques (Bachelor's dissertation).
Abstract: The introduction of electronic mail brought about a reliable and economical method of communication. However, apart from the advantages it offered, several disadvantages also came along, one of which is spam. Spam is unsolicited mail that, in some way or another, finds its way into our inboxes. Throughout the years, spam has increased and developed ways of disguising itself as a legitimate e-mail through deceptive appearances. More significantly, the last few decades have seen a vast production of techniques, designated to recognize and block spam from reaching our inboxes. A set of rules and Machine Learning Algorithms have been tried and tested as anti-spam filters to reduce as much spam as possible. This dissertation explores the comparison between a developed set of artefacts, together with the Machine Learning algorithms that have been tested in El-Sayed ElAlfy' s paper Learning Methods for Spam Filtering in the collection of research papers found in Computer Systems, Support and Technology (2011). Using the Spambase dataset to train and test the algorithms, 2 artefacts were created with the potential of being Anti-spam filters. Artefact 1 proposes a method that composed ofN-grams and the use of the entropy technique, combined with the Naive Bayes algorithm. Artefact 2 is a notable neural-network algorithm which is known as the Error Backpropagation Neural Network. Artefact 1 and 2 were compared with other well-known Machine Learning algorithms such as Support Vector Machine, Multi-Layer Perceptron, k-NN, decision trees and others. With regards to results, Artefact 1 ranked first in precision at 96.16% when compared with the other 20 algorithms tested, followed by the Radial Basis Function algorithm at 95.52%. Artefact 2, with a precision of 93.44% was ranked 18th, followed by the Nai"ve Bayes algorithm with a precision of 93.29%. The accuracy achieved by Artefact 1 was that of 89.66% and for Artefact 2 this was 92.15%. From the aforementioned results, it can be seen that Artefact 1 can offer quite competitive results.
Description: B.Sc. IT (Hons)(Melit.)
Appears in Collections:Dissertations - FacICT - 2015
Dissertations - FacICTCIS - 2010-2015

Files in This Item:
File Description SizeFormat 
  Restricted Access
5.76 MBAdobe PDFView/Open Request a copy

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.