Analysing the performance of the stack neural module network architecture on the VCR dataset

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/122390

Title:	Analysing the performance of the stack neural module network architecture on the VCR dataset
Authors:	Cauchi, Zachary (2024)
Keywords:	Artificial intelligence Computer networks Computer simulation
Issue Date:	2024
Citation:	Cauchi, Z. (2024). Analysing the performance of the stack neural module network architecture on the VCR dataset (Master's dissertation).
Abstract:	In the field of Artificial Intelligence (AI) vision-language tasks, Visual Commonsense Reasoning (VCR) stands out as an interesting case of requiring an AI model to not only predict correct answers, but explain why those answers were chosen. The Stack Neural Module Network (SNMN) model, while not designed to target VCR tasks, also stands out for different reasons; it is a compositional model which tries to predict answers to Visual Question Answering (VQA) tasks via a memory stack used to store the intermediate steps taken to predict a final answer. These intermediate outputs can then be visualised to better understand how the model is trying to arrive at its conclusion. This study adapts the SNMN model to predict answers and rationales in the VCR tasks — attempting to obtain an accuracy better than random guessing and at most within 20% of more recent state-of-the-art models — while still retaining use of its memory stack to provide intermediate outputs. The results do not reach state-of-the-art accuracy and also showed signs of overfitting, but do suggest avenues for future work that may yet improve the model.
Description:	M.Sc.(Melit.)
URI:	https://www.um.edu.mt/library/oar/handle/123456789/122390
Appears in Collections:	Dissertations - FacICT - 2024 Dissertations - FacICTCS - 2024

Files in This Item:

File	Description	Size	Format
2419ICTCPS511805073178_1.PDF		7.62 MB	Adobe PDF	View/Open

Show full item record Statistics