DocEng26 Competition

DocEng26 Competition

DocEng26 Competition: OCRs for corpus extraction for the Maltese language

For DocEng26, the NOMOCRAT project investigators organised a competition about creating an OCR that can extract Maltese text from an image in paragraph form (not as lines of text).

To participate:

  1. Download the assets you will be working with (link below).
  2. Install the the given requirements.txt file (in the assets) on Python 3.9 (you may get rid of the --extra-index-url line if you are not using GPUs).
  3. Run competition_evaluator.py to test that everything works.
  4. Attempt to recreate baseline results by copying the content of the example scripts into competition_transcriber.py and running competition_evaluator.py (see comments in examples scripts for instructions).
  5. Modify your competition_transcriber.py with your own solution to the competition task (according to the rules linked below).
  6. Create a HuggingFace repo containing your model.
  7. Create a GitHub repo containing your competition_transcriber.py (that downloads the model from HuggingFace) and, if needed, a requirements.txt file with additional packages to install apart from what is already mentioned in the assets.
  8. Submit a link in the Google Form submission link (link below), before 30th June 23:59 AoE (Anywhere on Earth).

https://www.um.edu.mt/projects/nomocrat/doceng26competition/