Please use this identifier to cite or link to this item:
Title: Genetic algorithm based metaheuristic optimisation of machine learning algorithm parameters
Authors: Camilleri, Michel
Keywords: Machine learning
Genetic algorithms
Issue Date: 2017
Abstract: through design and experimentation. The search for optimal or near optimal algorithm configuration parameters is one way to improve performance and presents a research area of significant interest not only to researchers in the field, but also to developers of new algorithms and data analysts in diverse scientific fields. This optimisation process involves a search of very often a large, possibly infinite, unique and irregular parameter search space which renders the search for optimality or near optimality difficult in feasible time. Metaheuristic algorithms with properties, such as exploration, exploitation, use of acquired knowledge gained and stochastic elements, render themselves them a suitable class of solutions for this problem. Various studies have explored different metaheuristic or meta-optimisation approaches in the search for ever better machine learner optimisers and more generally applicable ones. The aim of this study was thus to explore and validate the use of a Genetic Algorithm based approach, often selected in studies of specific optimisation problems, as a generally applicable solution to the Machine Learning Algorithm Parameter optimisation problem. The Simple Genetic Algorithm (SGA), in particular, was chosen as the focus for this study because of its clear metaheuristic qualities and its relatively simple and well known evolutionary architecture. The method used to measure the performance of the SGA as a general metaoptimiser was involved the carrying out of a series of experiments, in which the SGA and other meta-optimiser algorithms were applied to the optimisation of a select test base of machine learner algorithms and datasets, with different characteristics and under various experimental conditions. In a novel take over other studies, a close measured look was also taken at the efficiency and effectiveness of the SGA in the meta-optimisation process. This necessitated the design and setup of a significant number of different experiment runs in a set of phased studies. These experiments were run using multiple fold cross-validation at machine learner and meta-optimiser levels for statistical validity, involving around 200 million machine learner evaluations executed over a set of ten standard workstations. The duration of evaluations of individual configurations ranged from a few milliseconds to over one and a half hours. An automated system was designed and developed by the author to run the planned experimentation and gather valuable performance data for subsequent analysis and reporting. The results showed that there were consistent, though not statistically significant, indications that the SGA was on average a good, though not optimal optimiser of machine learning algorithm parameters over the selected test base. It was also found that without tuning, the SGA suffered from inefficiencies which reduced its overall effectiveness. Other secondary results and methodological insights obtained include: 1. the visualisation and measurement of the different accuracy parameter space landscapes, 2. the measurement of the overheads and inefficiencies of search, 3. the effect of evaluation time on machine learner performance, 4. the effect of dataset size and machine learner processing costs on metaoptimiser performance, 5. the effect of meta-optimiser tuning on its performance, 6. the value of low-cost pre-optimisation exploration of the optimisation problem, and 7. the general applicability of the measures developed or adapted for this study. Two meta-optimiser algorithms were developed for comparative analysis. One was based on Iterated Local search and employed a sampling of the parameter space neighbourhood of the current candidate for the local search. The other was a hybrid SGA with refinement of the epoch's best candidate through local search using the above neighbourhood sampling. The contribution of this study lies in the sum of its outcomes and the potential it holds for further research opportunities.
Description: PH.D.IT
Appears in Collections:Dissertations - FacICT - 2017
Dissertations - FacICTCIS - 2017

Files in This Item:
File Description SizeFormat 
PhD MIchel Camilleri - Nov 2017.pdf
  Restricted Access
8.14 MBAdobe PDFView/Open Request a copy

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.