Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/72818
Title: Approximate bayesian clustering of genomes
Authors: Catania, Romario (2017)
Keywords: Bayesian statistical decision theory
Genomics -- Statistical methods
Single nucleotide polymorphisms -- Statistical methods
Multivariate analysis
Dirichlet problem
Issue Date: 2017
Citation: Catania, R. (2017). Approximate bayesian clustering of genomes (Bachelor's dissertation).
Abstract: The main aim of this project is to investigate Bayesian techniques used to understand structure in genomic (population genetics) data. These techniques are used to understand population history, as controls in disease association studies and forensic studies. Specifically, we have used SNP (Single Nucleotide Polymorphism) data, which is categorical data. Bayesian techniques have become very popular in biology, especially in genetics. However these suffer from the issues of prior specification and being very computationally expensive. The first issue is tackled by specifying a prior based on the Dirichlet Process. We sought to justify this prior by going through its theory and theory of stochastic processes related to genetics. For the second issue, we use Variational Bayes, an estimation method used by computer scientists for similar models in Search Engine technology, where the aim is to obtain fast and approximate results from large text-based data sets. We evaluated these models, with parameters estimated through Variational Bayes, on public population genetic data. We compared their performance with results from Principal Component Analysis and the ground truth population labels. We have also fitted a Dirichlet Process model on text data to show an example where these models are unsuitable.
Description: B.SC.(HONS)STATS.&OP.RESEARCH
URI: https://www.um.edu.mt/library/oar/handle/123456789/72818
Appears in Collections:Dissertations - FacSci - 2017
Dissertations - FacSciSOR - 2017

Files in This Item:
File Description SizeFormat 
17BSCCISSOR001.pdf
  Restricted Access
3.12 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.