Accurate name extraction from news video graphics

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/137837

Title:	Accurate name extraction from news video graphics
Authors:	Lucas, Andrea Filiberto (2025)
Keywords:	Internet videos -- Malta Press -- Malta Data sets -- Malta Social media -- Malta
Issue Date:	2025
Citation:	Lucas, A. F. (2025). Accurate name extraction from news video graphics (Bachelor's dissertation).
Abstract:	The growth of video‐based news media has intensified the need for automated information extraction systems. Graphical elements such as captions and lower thirds often contain essential identifiers like personal names, yet manual extraction remains inefficient due to significant variation in design across broadcasts. This dissertation addresses the challenge of automatically extracting names from overlaid graphics in news video content, which is complicated by the diverse visual styles, typography, and spatial layouts present in such material. This research makes three primary contributions. First, it introduces the News Graphics Dataset (NGD), a custom dataset comprising annotated frames sourced from both local and foreign media. This dataset includes content from traditional broadcasts as well as social media‐based sources, capturing a wide range of graphical conventions. Second, it presents the Accurate Name Extraction Pipeline (ANEP), a modular framework for name extraction that integrates You Only Look Once (YOLO)v12‐based Object Detection (OD), Optical Character Recognition (OCR), and Named Entity Recognition (NER). Third, it offers a comparative evaluation against leading Generative Artificial Intelligence (GenAI) methods, including the Google Vision API (GVA) with Gemini 1.5 Pro and Large Language Model Meta AI (LLaMA) 4 Maverick. Empirical results demonstrate distinct performance characteristics across approaches. The GVA with Gemini 1.5 Pro achieved the highest overall performance, with an F1 score of 82.22%. However, the ANEP framework exhibited more balanced precision‐recall characteristics (72.92% and 74.44% respectively) and provided greater explainability compared to the generative models. A complementary survey of 404 respondents further validated the relevance of the research problem. Notably, 59.7% of participants reported difficulties in identifying names within news graphics, while 58.2% had paused videos specifically to identify individuals. These findings underscore both the technical viability and practical utility of automated name extraction systems, offering a solid foundation for future research in multimodal information extraction from broadcast content.
Description:	B.Sc. (Hons) ICT(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/137837
Appears in Collections:	Dissertations - FacICT - 2025 Dissertations - FacICTAI - 2025

Files in This Item:

File	Description	Size	Format
2508ICTICT390900017401_1.PDF Restricted Access		11.5 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics