Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/39496
Full metadata record
DC FieldValueLanguage
dc.date.accessioned2019-02-05T10:02:43Z-
dc.date.available2019-02-05T10:02:43Z-
dc.date.issued2018-
dc.identifier.citationPullicino, K. (2018). A MapReduce approach to genome alignment (Master's dissertation).en_GB
dc.identifier.urihttps://www.um.edu.mt/library/oar//handle/123456789/39496-
dc.descriptionM.SC.ARTIFICIAL INTELLIGENCEen_GB
dc.description.abstractRecent years brought an enormous growth in DNA sequencing capacity and speed, thanks to the application of Next-Generation Sequencing (NGS) technologies. The alignment of read sequences to a given reference genome is crucial for further diagnostic downstream analysis. Finding the optimal alignment of short DNA reads from a biological sample to a reference human genome, requires big data techniques, since reads' size are in the region of 200GB. In this dissertation we present two approaches to perform distributed sequence alignment of genomic data based on the MapReduce programming paradigm. MR-BWA presents a novel approach in distributing BWA in a different manner than existing work. BWA is an industry standard software used for genomic reads alignment. MR-BWT-FM presents low level optimizations on suffix array and BWT creation which are used to create a custom FM-Index which in turn is used for distributed genome sequence alignment. Output generated by the application generates insights and charts about the results. We evaluate the performance and correctness of both approaches by comparing our output with that of similar tools, using standard datasets from the 1000 Genomes Project. Performance and correctness results for both distributed approaches are comparable with similar tools, whilst the final custom FM-Index size is smaller than the standard BWA index size. The source code of the software described in this dissertation is publicly available at https://github.com/kpullu/msc.en_GB
dc.language.isoenen_GB
dc.rightsinfo:eu-repo/semantics/restrictedAccessen_GB
dc.subjectDNAen_GB
dc.subjectGene mappingen_GB
dc.subjectGenomics -- Methodsen_GB
dc.titleA MapReduce approach to genome alignmenten_GB
dc.typebachelorThesisen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.en_GB
dc.publisher.institutionUniversity of Maltaen_GB
dc.publisher.departmentFaculty of Information and Communication Technology. Department of Artificial Intelligenceen_GB
dc.description.reviewedN/Aen_GB
dc.contributor.creatorPullicino, Karl-
Appears in Collections:Dissertations - FacICT - 2018
Dissertations - FacICTAI - 2018

Files in This Item:
File Description SizeFormat 
18MAIPT07.pdf
  Restricted Access
2.15 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.