Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/138425
Title: miRBench : novel benchmark datasets for microRNA binding site prediction that mitigate against prevalent microRNA frequency class bias
Authors: Sammut, Stephanie
Gresova, Katarina
Tzimotoudis, Dimosthenis
Marsalkova, Eva
Cechak, David
Alexiou, Panagiotis
Keywords: MicroRNAs
Genetic regulation
Computational biology
Machine learning
Benchmarking (Management)
Bioinformatics -- Data processing
Issue Date: 2025
Citation: Sammut, S., Gresova, K., Tzimotoudis, D., Marsalkova, E., Cechak, D., & Alexiou, P. (2025). miRBench: novel benchmark datasets for microRNA binding site prediction that mitigate against prevalent microRNA frequency class bias. bioRxiv, 1-12.
Abstract: Motivation: MicroRNAs (miRNAs) are crucial regulators of gene expression, but the precise mechanisms governing their binding to target sites remain unclear. A major contributing factor to this is the lack of unbiased experimental datasets for training accurate prediction models. While recent experimental advances have provided numerous miRNA-target interactions, these are solely positive interactions. Generating negative examples in silico is challenging and prone to introducing biases, such as the miRNA frequency class bias identified in this work. Biases within datasets can compromise model generalization, leading models to learn dataset-specific artifacts rather than true biological patterns.
Results: We introduce a novel methodology for negative sample generation that effectively mitigates the miRNA frequency class bias. Using this methodology, we curate several new, extensive datasets and benchmark several state-of-the-art methods on them. We find that a simple convolutional neural network model, retrained on some of these datasets, is able to outperform state-of-the-art methods. This highlights the potential for leveraging unbiased datasets to achieve improved performance in miRNA binding site prediction. To facilitate further research and lower the barrier to entry for machine learning researchers, we provide an easily accessible Python package, miRBench, for dataset retrieval, sequence encoding, and the execution of state-of-the-art models.
URI: https://www.um.edu.mt/library/oar/handle/123456789/138425
Appears in Collections:Scholarly Works - FacBenCPM



Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.