OAR@UM Collection:

OAR@UM Collection: https://www.um.edu.mt/library/oar/handle/123456789/62785 2026-04-27T20:38:41Z 2026-04-27T20:38:41Z Hidden Markov models and their extensions with applications in finance and gambling https://www.um.edu.mt/library/oar/handle/123456789/63173 2020-11-08T06:30:21Z 2020-01-01T00:00:00Z

Title: Hidden Markov models and their extensions with applications in finance and gambling Abstract: Hidden Markov models (HMMs) are time series models which incorporate serial dependence via a latent (hidden) discrete-time Markov chain (DTMC). Consequently, standard HMMs assume that the distribution which generates an observation at a particular point in time depends on the chosen state of the latent DTMC at that point in time. HMMs also motivate several important extensions. One such extension is the hidden semi-Markov model (HSMM). HSMMs generalize HMMs by allowing dwell-time distributions in states to be modelled explicitly instead of relying on the geometric distribution assumption imposed by the HMM setup. Thus, state changes and state persistence are now controlled by what is called a semi-Markov chain. The application which follows sees the implementation of HMMs and HSMMs with normal state-dependent distributions to model daily returns of the S&P 500 Index and the BTC/USD exchange rate. The aim is that of identifying market regimes mainly bull and bear market phases. Another important extension to the standard HMM is to allow for the inclusion of time-varying covariates in the state-dependent parameters. Through appropriate link functions, the state-dependent means can be allowed to change, not only according to the state, but also according to covariate information. This framework leads to the HMM-GLM hybrid, where HMM Regression (HMMR) is a special case when the state-dependent distributions are assumed normal. The application which follows involves modelling problem gambling behaviour through the use of HMMs with seasonal adjustments. The aim is that of identifying inactive, moderately active, and highly active periods by monitoring players’ gambling history. Description: M.SC.STATISTICS

2020-01-01T00:00:00Z Topic modelling of newspaper comments using embedding vectors and clustering https://www.um.edu.mt/library/oar/handle/123456789/63172 2020-11-08T06:30:55Z 2020-01-01T00:00:00Z

Title: Topic modelling of newspaper comments using embedding vectors and clustering Abstract: As the use of the Internet and online social media increases, text data is becoming an ever more important source of data. To this end, a vast number of techniques have been developed in the ﬁeld of Natural Language Processing. These include n-grams, skip-grams, the Bag-of-Words model, Term Frequency-Inverse Document Frequency, stemming, lemmatisation, embedding vectors, and clustering techniques. This dissertation investigates the theoretical foundations of these techniques, and they are then applied to a dataset of online newspaper comments written between 2008 and 2017 obtained from the Times of Malta website. In particular, the FastText algorithm (Bojanowski et al., 2017) is used to transform each unique word in the dataset to a vector representation known as a word embedding by means of an underlying neural network framework. The word embeddings are then used to obtain clusters by means of the k-means clustering algorithm. Vector representations are also obtained for each online newspaper comment, where again similar comments are assigned similar representations. The obtained representations, which are known as document embeddings, are then also clustered using k-means clustering. The results obtained from the in-depth analysis of the data show that the vast majority of comments are political in nature, with comments related to sports, arts and culture being less frequent than possibly expected. In addition, a number of topics were identiﬁed as being more prevalent during some time periods than during others. These include divorce in 2011, as well as Maltese citizenship in 2013 and Russia’s annexation of Crimea in 2014. Furthermore, the morning-after pill and corruption were two topics that were highly discussed in 2016. Description: B.SC.(HONS)STATS.&OP.RESEARCH

2020-01-01T00:00:00Z Univariate and multivariate change-point analysis with application to cryptocurrency time series https://www.um.edu.mt/library/oar/handle/123456789/63171 2020-11-08T06:28:43Z 2020-01-01T00:00:00Z

Title: Univariate and multivariate change-point analysis with application to cryptocurrency time series Abstract: In recent years, cryptocurrencies have increased in popularity, especially Bitcoin, and they have gone through numerous events that caused them to experience changes in their price distribution. In this dissertation, we will aim to detect these changes by minimising a cost function over possible numbers and locations of change-points. These functions are typically formulated as the total costs of the segments added with a penalty term which increases as the number of change-points increases. We will ﬁrst estimate the changes in the mean only, in the variance only and in both mean and variance in the log-returns of Bitcoin. Then, we will estimate the changes in the mean vector only, in the covariance matrix only and both mean vector and covariance matrix in the log-returns of four cryptocurrencies, which are Bitcoin, Ethereum, Ripple and Litecoin. Three search methods will be used to ﬁnd the optimal solution and will be compared for their accuracies and their computational time using diﬀerent penalties: binary segmentation, segment neighbourhood and PELT. Afterwards, we will use a method to ﬁnd the optimal segmentations over a range of penalty values and graphically identify a suitable penalty choice. Description: B.SC.(HONS)STATS.&OP.RESEARCH

2020-01-01T00:00:00Z Predicting risk of gestational diabetes mellitus through nearest neighbour classiﬁcation https://www.um.edu.mt/library/oar/handle/123456789/63170 2020-11-08T06:29:18Z 2020-01-01T00:00:00Z

Title: Predicting risk of gestational diabetes mellitus through nearest neighbour classiﬁcation Abstract: Gestational diabetes mellitus is a speciﬁc type of diabetes which arises as a complication of pregnancy, and which can adversely aﬀect both the mother and the child. Diagnosis of this condition is carried out through screening coupled with an oral glucose tolerance test; however, these prove to be quite expensive to carry out. Therefore, it would be ideal that a prior clinical risk assessment would ﬁlter out any individuals who are not at risk of acquiring this disease, thus preventing the need to perform these costly tests. The prediction of risk of gestational diabetes mellitus is here formulated as a binary classiﬁcation problem, with nearest neighbour methods being quite popular in this area of study. The k-Nearest Neighbour, Fixed-Radius Nearest Neighbour and Kernel classiﬁers are applied to a dataset consisting of pregnant women from 11 Mediterranean countries. Binary logistic regression is also applied in order to compare its performance to that of nearest neighbour methods. The classiﬁcation techniques are thus compared using various performance measures determining which methods are the best at predicting positive cases of gestational diabetes mellitus. Description: B.SC.(HONS)STATS.&OP.RESEARCH

2020-01-01T00:00:00Z