Decision tree learning algorithms in a cloud computing environment, utilizing the map reduce programming framework

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/95316

Title:	Decision tree learning algorithms in a cloud computing environment, utilizing the map reduce programming framework
Authors:	Vella, Luke (2013)
Keywords:	Cloud computing Genetic algorithms Algorithms
Issue Date:	2013
Citation:	Vella, K. (2013). Decision tree learning algorithms in a cloud computing environment, utilizing the map reduce programming framework (Bachelor's dissertation).
Abstract:	Decision tree learning algorithms are one of the most commonly used techniques for learning from sets of collected data [22] [21 J. Their wide use in several real-life applications means that such algorithms have to work on data sets with ever-increasing sizes and on machines with limited processing power. The recently discovered MapReduce framework is a parallel programming framework, which enables users to develop parallel algorithms [36]. These parallel algorithms are executed in a grid-computing environment in order to utilize resources from the various machines connected to the grid. In this project a cloud-based, grid-computing environment is used, which is pre-set to execute MapReduce algorithms. This project explores the use of Apache Hadoop, which is an open-source implementation of the MapReduce framework [14], to parallelize algorithms, particularly Decision Tree Learning algorithms. An implementation of a parallelized version of Quinlan's [23] ID3 algorithm using the MapReduce framework is presented. Two more variations of the algorithm are also presented in this project; one implemented using breadth-first tree induction and the other implemented in a partially parallel manner. The aim of these implementations is to propose other methods for the implementation of decision tree learning algorithms which execute more efficiently on larger data sets. Reference was made to other similar parallel decision tree learning algorithm implementations [19] [28] [22] in order to develop the aforementioned algorithm variations. The evaluation of the project shows that all the implemented algorithms successfully derive a decision tree from the data set supplied. The performance of the algorithms was compared and the results show that having partially parallelized algorithm can be more efficient than a completely parallel one. Findings have also shown that implementing such algorithms using a breadth-first induction have a hidden inefficiency which could lead to a less efficient performance of the algorithm.
Description:	B.Sc. IT (Hons)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/95316
Appears in Collections:	Dissertations - FacICT - 2013 Dissertations - FacICTCIS - 2010-2015

Files in This Item:

File	Description	Size	Format
BSC(HONS)ICT_Vella_ Luke_2013.PDF Restricted Access		7.07 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics