Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/141988
Title: Exploring SARS‐CoV‐2 antibody data through machine learning methods
Authors: Chircop, Francesca (2026)
Keywords: Immunity
Immunoglobulins -- Malta
Machine learning
Bioinformatics -- Malta
COVID-19 (Disease) -- Malta
Issue Date: 2026
Citation: Chircop, F. (2026). Exploring SARS‐CoV‐2 antibody data through machine learning methods (Master's dissertation).
Abstract: The diversity and binding specificity of antibody repertoires, particularly within the Complementarity Determining Region H3 (CDRH3) region are fundamental to immune protection and therapeutic antibody development. In this work, we integrate hierarchical clustering, motif discovery, sequence based machine learning, and user‐friendly graphical interfaces into a bioinformatics framework to: (1) identify and classify SARS‐CoV‐2 CDRH3 families, (2) extract conserved paratope motifs, (3) predict antibody–antigen binding using both classical and deep learning models, and (4) enable accessible sequence translation and clustering through two standalone Graphical User Interfaces (GUIs). Our analysis is based on sequences from two distinct sources: the public CoV‐AbDab database curated by the Oxford Protein Informatics Group (OPIG), and proprietary scFv phage display data obtained via Sanger sequencing at the University of Malta. Through clustering, we identify eight major public clonotypes in both the University of Malta dataset and the OPIG dataset, each with distinct sequence logos. A stacking ensemble comprising logistic regression, random forest, and Extreme Gradient Boosting (XGBoost) attains an Receiver Operating Characteristic Area Under the Curve (ROC‐AUC) of 0.71, outperforming each individual model, including traditional machine learning approaches (logistic regression, random forest, XGBoost) and deep learning architectures (Bidirectional Long Short‐Term Memory (Bi‐LSTM), ProtBERT, Siamese Convolutional Neural Network (CNN)). In usability trials, biomedical students with no coding experience installed our GUIs, performed DNA to protein translation, variable annotation, clustering, and figure export in under five minutes, and contributed feedback that led to a highly intuitive interface. Together, these results provide a reproducible, end to end toolkit for rapid, in‐silico antibody repertoire analysis and binding prediction, supporting both computational immunology research and experimental planning.
Description: M.Sc.(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/141988
Appears in Collections:Dissertations - FacICT - 2026

Files in This Item:
File Description SizeFormat 
2518ICTICT501200012600_01.PDF12.91 MBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.