OAR@UM Collection:

OAR@UM Collection: https://www.um.edu.mt/library/oar/handle/123456789/106752 2026-04-20T18:38:16Z Comparative study on reusable multilingual approaches for Maltese sentiment analysis https://www.um.edu.mt/library/oar/handle/123456789/121904 Title: Comparative study on reusable multilingual approaches for Maltese sentiment analysis Abstract: Sentiment Analysis can identify the sentiment of news topics such as: abortion, immigration, the death penalty, etc. Sentiment identification is important because it automizes the task on manually checking how the author is feeling about that topic. Ideally, in news articles topics should be neutral but due to different agendas and political bias can exist in those articles. The proposed research topic aims at learning, using, and contributing to the natural language processing research area in the Maltese language. SA is also important for both consumers and companies that conduct surveys which get information regarding opinions to their particular service or product. Sentiment analysis can also be important when it comes to a country’s national security and public opinion analysis Yue et al. (2019). Two approaches are investigated and compared with each other: The first one uses an English data set which is combined with a Maltese and an Italian data set that are translated to English for training. The training data sets are then tested against Maltese texts which are also translated to English. In the second approach, the same data sets used in approach one are translated into Maltese instead of English. Furthermore, the testing phase is similar to approach one but there is no translation on the Maltese dataset. To identify the polarity of the text, support vector machines and long term short memory are used. Moreover, there are 3 sentiment labels, two of which represent positive and negative while the other represents the neutral sentiment. Finally, they are tested against sentence and document levels. Several aspects are used as evaluation on the methodology which are: data set distribution, the domains and language number, other peer reviewed literature, the performance, filtering, SA level (Document and sentence), algorithms(RNN and LSTM), 2 labels(Positive and negative). Several aspects are discovered during the experimentation phase of this work such as: For short texts, the negative label works best in the second approach LSTM with 224 negative texts being guessed correctly out of 485, the neutral label works best in the first approach LSTM with 141 neutral texts being guessed correctly out of 178, and positive works best in the first approach SVM with 102 positive texts being guessed correctly out of 237. For long texts, the negative label works best in the first and second approach SVM with 5 negative documents being guessed correctly out of 5, the neutral label works best first and second approach LSTM with 3 out of 5 documents being guessed correctly, and the positive label works best in first approach SVM with 2 out of 5 documents being guessed correctly. Moreover, filtering produced worse results; this could be due to neutral features being removed which could have confused the sentiment analysis, when only 1 data set language (i.e., English) is used; the results in general seem to get worse apart from the negative class which produced constant good performance, when only two labels are used; the results are better for both positive and negative, during the experiments, SVM was better at predicting three labels out of four times. LSTM was equally good one time out of four as SVM but in all the experiments LSTM was never better than SVM. Description: M.Sc.(Melit.) 2023-01-01T00:00:00Z Improving portfolio construction using deep generative machine learning models applying generative models on financial market data https://www.um.edu.mt/library/oar/handle/123456789/120602 Title: Improving portfolio construction using deep generative machine learning models applying generative models on financial market data Abstract: The lack of availability of financial data has posed serious challenges to practitioners and researchers alike, especially to address portfolio and risk management problems using machine learning techniques. In order to alleviate these challenges in recent years there has been a rising interest in the use of deep generative models to generate synthetic time series data and augment existing datasets. Apart from being successful in generating synthetic images in the computer vision domain, Generative Adversarial Networks (GANs) have recently been also used in practical time series applications. For synthetic time series data to be useful in portfolio management, not only does it need to possess comparable statistical properties, but in addition it requires to feature similar correlational structure as ground truth data. In this work we examine the correlation characteristics of synthetic financial time series data generated by a deep generative model, TimeGAN, and carry out a holistic assessment that includes visual evaluation of stylized facts and quantitative evaluation of correlation similarity. We run experiments with a dataset containing features of a single stock following the original authors of TimeGAN, and additional datasets containing the market data of multiple stocks featuring a broad range of pairwise correlation coefficients. We demonstrate that TimeGAN-generated market data preserves well the correlation structure for the multi-stock datasets examined. Moreover, we propose a GAN-assisted portfolio construction technique that can be incorporated with traditional portfolio management methods. The proposed scheme is framed as an extension of an established deep learning portfolio optimisation technique proposed by Cai et al. (2019), against which we benchmark our study, where we utilise TimeGAN to generate correlated synthetic future price paths of a set of stocks. Using the generated price paths we introduce an efficient way of de-risking the portfolio by filtering stocks that are expected to exhibit high volatility out-of-sample. We carry out experiments with two filtering approaches, global and adaptive, and demonstrate that using correlation-aware synthetic data together with real historic market data in a systematic manner improves out-of-sample portfolio Sharpe Ratio by 18.1% and cumulative portfolio return by 46.8% when compared with a benchmark portfolio constructed with historic data only. The encouraging results achieved in this study suggest that in financial settings where time series data is limited, combining historic data with correlation-informed synthetic data in the construction of risky portfolios, can potentially help financial practitioners make better investment decisions. Description: M.Sc.(Melit.) 2023-01-01T00:00:00Z A graph theoretic approach to rapid transit network design https://www.um.edu.mt/library/oar/handle/123456789/120600 Title: A graph theoretic approach to rapid transit network design Abstract: Designing rapid transport networks that maximise ridership or coverage is a critical challenge in transportation planning. However, this is an NPhard problem, and given the investment required to build these networks, designing the best network possible is a priority. In this research, we propose a novel data-driven approach to designing transport networks that automatically identifies station locations and constructs an optimised network that maximises ridership within a fixed budget. Our approach uses a combination of density based scanning, spatial partitioning, branch and bound, and genetic algorithms to construct an optimised network. We evaluate candidate solutions based on their expected modal shift, which measures the number of journeys the new network is expected to capture. Our approach is flexible and can also be applied to expand existing networks, optimise for a coverage goal, or design multimodal networks that integrate feeder networks with rapid transport stations. To evaluate the effectiveness of our approach, we present a method for generating synthetic datasets that can be used to test our algorithm and other network design algorithms. Our approach was successful in testing, designing networks that achieved good expected modal shift on both synthetic datasets and in a case study on the city of Bogotá, Colombia. On the Bogotá dataset one solution with a e200 million budget achieved an expected modal shift of 4.87%. Overall, our approach represents a contribution to the field of transportation planning by providing a data-driven method which can design complete networks given only travel survey data. Additionally we show how the problem can be framed using graph theory and parts of the problem can be mapped to the heaviest subgraph problem and the travelling salesman problem. Future work could build on our method by improving evaluation accuracy using agent based simulation or using graph neural networks in place of branch and bound. Description: M.Sc.(Melit.) 2023-01-01T00:00:00Z Predicting gambling addiction using knowledge graph and machine learning https://www.um.edu.mt/library/oar/handle/123456789/120599 Title: Predicting gambling addiction using knowledge graph and machine learning Abstract: The aim of this study is to apply machine learning and knowledge graph technologies to the online gambling domain, to help predicting players at risk of developing gambling addiction. While previous research would often focus exclusively on Voluntary Self-Exclusion, we believe that the usage of the Voluntary Self-Exclusion tool is not always a good proxy to predict gambling addiction, as players might use this tool as a quick way to close their accounts. In our study we used features related to the customers’ behaviour based on the Diagnostic and Statistical Manual of Mental Disorders fifth edition (DSM-5), to find the most accurate technique between five machine learning models (random forest, naive Bayes, gradient boost, logistic regression, K-NN) and five Knowledge Graph Embedding models (TransE, TransR, DistMult, ComplEx, RotatE). We also investigated the use of techniques to balance our dataset, as our at-risk players were only around 1% of the population. Furthermore, we experimented with the Shapley Additive Explanations technique, to understand the reasons behind the detection. We also investigated how well gambling addiction predictions works in Graphs. More specifically, we built our customers’ behaviour Knowledge Graph, called BKG, and implemented Knowledge Graph Embedding models, so that we could find similarities between players who used Voluntary Self-Exclusion tools and players who did not use Voluntary Self-Exclusion tools at the time of this study. Finally, we used the embeddings of ComplEx as input in the gradient boosting model. We concluded that using DSM-5 features for machine learning and Knowledge Graph Embedding models is a promising approach. Our best machine learning model was gradient boosting, which achieved an AUC-ROC of 0.86. For our Knowledge Graph Embedding experiments our best model was ComplEx with MR of 1.86. The experiment to check players with similar features also performed well, as we were able to detect around 66% of the players who had not used the Voluntary Self-Exclusion at the time of the study, but would end up using it after the data had been collected. On the other hand, our approach of using embeddings as input in the gradient boosting model did not perform as expected, as the most accurate model was still gradient boost with the tabular data. Description: M.Sc.(Melit.) 2023-01-01T00:00:00Z