Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/141988| Title: | Exploring SARS‐CoV‐2 antibody data through machine learning methods |
| Authors: | Chircop, Francesca (2026) |
| Keywords: | Immunity Immunoglobulins -- Malta Machine learning Bioinformatics -- Malta COVID-19 (Disease) -- Malta |
| Issue Date: | 2026 |
| Citation: | Chircop, F. (2026). Exploring SARS‐CoV‐2 antibody data through machine learning methods (Master's dissertation). |
| Abstract: | The diversity and binding specificity of antibody repertoires, particularly within the Complementarity Determining Region H3 (CDRH3) region are fundamental to immune protection and therapeutic antibody development. In this work, we integrate hierarchical clustering, motif discovery, sequence based machine learning, and user‐friendly graphical interfaces into a bioinformatics framework to: (1) identify and classify SARS‐CoV‐2 CDRH3 families, (2) extract conserved paratope motifs, (3) predict antibody–antigen binding using both classical and deep learning models, and (4) enable accessible sequence translation and clustering through two standalone Graphical User Interfaces (GUIs). Our analysis is based on sequences from two distinct sources: the public CoV‐AbDab database curated by the Oxford Protein Informatics Group (OPIG), and proprietary scFv phage display data obtained via Sanger sequencing at the University of Malta. Through clustering, we identify eight major public clonotypes in both the University of Malta dataset and the OPIG dataset, each with distinct sequence logos. A stacking ensemble comprising logistic regression, random forest, and Extreme Gradient Boosting (XGBoost) attains an Receiver Operating Characteristic Area Under the Curve (ROC‐AUC) of 0.71, outperforming each individual model, including traditional machine learning approaches (logistic regression, random forest, XGBoost) and deep learning architectures (Bidirectional Long Short‐Term Memory (Bi‐LSTM), ProtBERT, Siamese Convolutional Neural Network (CNN)). In usability trials, biomedical students with no coding experience installed our GUIs, performed DNA to protein translation, variable annotation, clustering, and figure export in under five minutes, and contributed feedback that led to a highly intuitive interface. Together, these results provide a reproducible, end to end toolkit for rapid, in‐silico antibody repertoire analysis and binding prediction, supporting both computational immunology research and experimental planning. |
| Description: | M.Sc.(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/141988 |
| Appears in Collections: | Dissertations - FacICT - 2026 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2518ICTICT501200012600_01.PDF | 12.91 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
