DocEng26 Competition

DocEng26 Competition: OCRs for corpus extraction for the Maltese language

For DocEng26, the NOMOCRAT project investigators organised a competition about creating an OCR that can extract Maltese text from an image in paragraph form (not as lines of text).

To participate:

Download the assets you will be working with (link below).
Install the the given requirements.txt file (in the assets) on Python 3.9 (you may get rid of the --extra-index-url line if you are not using GPUs).
Run competition_evaluator.py to test that everything works.
Attempt to recreate baseline results by copying the content of the example scripts into competition_transcriber.py and running competition_evaluator.py (see comments in examples scripts for instructions).
Modify your competition_transcriber.py with your own solution to the competition task (according to the rules linked below).
Create a HuggingFace repo containing your model.
Create a GitHub repo containing your competition_transcriber.py (that downloads the model from HuggingFace) and, if needed, a requirements.txt file with additional packages to install apart from what is already mentioned in the assets.
Submit a link in the Google Form submission link (link below), before 30th June 23:59 AoE (Anywhere on Earth).

DocEng26 Competition

DocEng26 Competition: OCRs for corpus extraction for the Maltese language

Competition description and rules

Official competition call for competitors

Assets for competitors

Submission form