Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/138425| Title: | miRBench : novel benchmark datasets for microRNA binding site prediction that mitigate against prevalent microRNA frequency class bias |
| Authors: | Sammut, Stephanie Gresova, Katarina Tzimotoudis, Dimosthenis Marsalkova, Eva Cechak, David Alexiou, Panagiotis |
| Keywords: | MicroRNAs Genetic regulation Computational biology Machine learning Benchmarking (Management) Bioinformatics -- Data processing |
| Issue Date: | 2025 |
| Citation: | Sammut, S., Gresova, K., Tzimotoudis, D., Marsalkova, E., Cechak, D., & Alexiou, P. (2025). miRBench: novel benchmark datasets for microRNA binding site prediction that mitigate against prevalent microRNA frequency class bias. bioRxiv, 1-12. |
| Abstract: | Motivation:
MicroRNAs (miRNAs) are crucial regulators of gene expression, but the precise mechanisms governing their binding to
target sites remain unclear. A major contributing factor to this is the lack of unbiased experimental datasets for training
accurate prediction models. While recent experimental advances have provided numerous miRNA-target interactions,
these are solely positive interactions. Generating negative examples in silico is challenging and prone to introducing
biases, such as the miRNA frequency class bias identified in this work. Biases within datasets can compromise model
generalization, leading models to learn dataset-specific artifacts rather than true biological patterns. Results: We introduce a novel methodology for negative sample generation that effectively mitigates the miRNA frequency class bias. Using this methodology, we curate several new, extensive datasets and benchmark several state-of-the-art methods on them. We find that a simple convolutional neural network model, retrained on some of these datasets, is able to outperform state-of-the-art methods. This highlights the potential for leveraging unbiased datasets to achieve improved performance in miRNA binding site prediction. To facilitate further research and lower the barrier to entry for machine learning researchers, we provide an easily accessible Python package, miRBench, for dataset retrieval, sequence encoding, and the execution of state-of-the-art models. |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/138425 |
| Appears in Collections: | Scholarly Works - FacBenCPM |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| miRBench_novel_benchmark_datasets_for_microRNA_binding_site_prediction_that_mitigate_against_prevalent_microRNA_Frequency_Class_Bias(2024).pdf | 1.54 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
