OAR@UM Collection:

OAR@UM Collection: https://www.um.edu.mt/library/oar/handle/123456789/121802 2026-04-05T05:52:31Z 2026-04-05T05:52:31Z Analysing the performance of the stack neural module network architecture on the VCR dataset https://www.um.edu.mt/library/oar/handle/123456789/122390 2024-05-17T08:26:17Z 2024-01-01T00:00:00Z

Title: Analysing the performance of the stack neural module network architecture on the VCR dataset Abstract: In the field of Artificial Intelligence (AI) vision-language tasks, Visual Commonsense Reasoning (VCR) stands out as an interesting case of requiring an AI model to not only predict correct answers, but explain why those answers were chosen. The Stack Neural Module Network (SNMN) model, while not designed to target VCR tasks, also stands out for different reasons; it is a compositional model which tries to predict answers to Visual Question Answering (VQA) tasks via a memory stack used to store the intermediate steps taken to predict a final answer. These intermediate outputs can then be visualised to better understand how the model is trying to arrive at its conclusion. This study adapts the SNMN model to predict answers and rationales in the VCR tasks — attempting to obtain an accuracy better than random guessing and at most within 20% of more recent state-of-the-art models — while still retaining use of its memory stack to provide intermediate outputs. The results do not reach state-of-the-art accuracy and also showed signs of overfitting, but do suggest avenues for future work that may yet improve the model. Description: M.Sc.(Melit.)

2024-01-01T00:00:00Z