Using evolutionary algorithms for DFA learning

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/92198

Title:	Using evolutionary algorithms for DFA learning
Authors:	Grech, Francesco (2021)
Keywords:	Genetic programming (Computer science) Sequential machine theory Algorithms Heuristic algorithms Evolutionary computation
Issue Date:	2021
Citation:	Grech, F. (2021). Using evolutionary algorithms for DFA learning (Bachelor’s dissertation).
Abstract:	Grammatical inference is the task of learning a formal grammar or class description from a set of training examples. In this research, we tackle the task of finding the minimum state Deterministic Finite-state Automaton (DFA) that is consistent with the training data, which is an NP-Complete problem. The problem instances that we are concerned with consist of an unknown DFA whose alphabet is binary, similar to those used in the Abbadingo One competition. A dataset of strings is labelled by this DFA and given to the learner. The learner constructs a hypothesis (DFA) with this dataset, whose accuracy is calculated using another set of strings, called the testing set. A common and successful approach to solving this problem is by building a maximally specific hypothesis called an Augmented Prefix Tree Acceptor (APTA) that embeds the training data exactly. States in a DFA can be merged to reduce its size. From the APTA, many merges are possible, and heuristics such as the Abbadingo-winning Evidence-Driven State Merging (EDSM) algorithm can be used to score the merges. EDSM does this by comparing labels of states which are to be merged. The highest-scoring merge is performed, and this process repeats until no more merges are possible, resulting in the final DFA. In this project, we evolve these types of heuristics using a Genetic Program (GP). Running this GP multiple times results in multiple viable heuristics. The performance of these heuristics can be increased by forming an ensemble, which when faced with a problem instance, each heuristic in it generates a hypothesis. Then, these DFAs parse the testing set using one of many modes, such as a fair majority vote. In such case, each string is parsed by each DFA, and its label is determined by the majority’s verdict. In this project, we design and implement a GP with several features that ensure proper representation and diversity of heuristics. We also experiment with different types of GP runs that focus on evolving heuristics for other purposes. Several heuristics were evolved which in turn were used to form ensembles of different sizes. Furthermore, the several ensemble modes are also evaluated using a number of problem instances. Over 16-, 32-, 64- and 128-state problem instances, the evolved individual heuristics perform roughly as well as W-EDSM, a variation of EDSM. However, when forming an ensemble with these heuristics, we see a significant boost in performance, with some ensembles performing nearly three times better than W-EDSM. Furthermore, smaller ensembles of sizes 10 and 20 achieve 1.5 – 2 times the performance of W-EDSM in some cases while still keeping the execution time linear in the size of the ensemble.
Description:	B.Sc. IT (Hons)(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/92198
Appears in Collections:	Dissertations - FacICT - 2021 Dissertations - FacICTAI - 2021

Files in This Item:

File	Description	Size	Format
21BITAI025.pdf Restricted Access		1.09 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics