Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/122390
Title: Analysing the performance of the stack neural module network architecture on the VCR dataset
Authors: Cauchi, Zachary (2024)
Keywords: Artificial intelligence
Computer networks
Computer simulation
Issue Date: 2024
Citation: Cauchi, Z. (2024). Analysing the performance of the stack neural module network architecture on the VCR dataset (Master's dissertation).
Abstract: In the field of Artificial Intelligence (AI) vision-language tasks, Visual Commonsense Reasoning (VCR) stands out as an interesting case of requiring an AI model to not only predict correct answers, but explain why those answers were chosen. The Stack Neural Module Network (SNMN) model, while not designed to target VCR tasks, also stands out for different reasons; it is a compositional model which tries to predict answers to Visual Question Answering (VQA) tasks via a memory stack used to store the intermediate steps taken to predict a final answer. These intermediate outputs can then be visualised to better understand how the model is trying to arrive at its conclusion. This study adapts the SNMN model to predict answers and rationales in the VCR tasks — attempting to obtain an accuracy better than random guessing and at most within 20% of more recent state-of-the-art models — while still retaining use of its memory stack to provide intermediate outputs. The results do not reach state-of-the-art accuracy and also showed signs of overfitting, but do suggest avenues for future work that may yet improve the model.
Description: M.Sc.(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/122390
Appears in Collections:Dissertations - FacICT - 2024
Dissertations - FacICTCS - 2024

Files in This Item:
File Description SizeFormat 
2419ICTCPS511805073178_1.PDF7.62 MBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.