Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/122390| Title: | Analysing the performance of the stack neural module network architecture on the VCR dataset |
| Authors: | Cauchi, Zachary (2024) |
| Keywords: | Artificial intelligence Computer networks Computer simulation |
| Issue Date: | 2024 |
| Citation: | Cauchi, Z. (2024). Analysing the performance of the stack neural module network architecture on the VCR dataset (Master's dissertation). |
| Abstract: | In the field of Artificial Intelligence (AI) vision-language tasks, Visual Commonsense Reasoning (VCR) stands out as an interesting case of requiring an AI model to not only predict correct answers, but explain why those answers were chosen. The Stack Neural Module Network (SNMN) model, while not designed to target VCR tasks, also stands out for different reasons; it is a compositional model which tries to predict answers to Visual Question Answering (VQA) tasks via a memory stack used to store the intermediate steps taken to predict a final answer. These intermediate outputs can then be visualised to better understand how the model is trying to arrive at its conclusion. This study adapts the SNMN model to predict answers and rationales in the VCR tasks — attempting to obtain an accuracy better than random guessing and at most within 20% of more recent state-of-the-art models — while still retaining use of its memory stack to provide intermediate outputs. The results do not reach state-of-the-art accuracy and also showed signs of overfitting, but do suggest avenues for future work that may yet improve the model. |
| Description: | M.Sc.(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/122390 |
| Appears in Collections: | Dissertations - FacICT - 2024 Dissertations - FacICTCS - 2024 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2419ICTCPS511805073178_1.PDF | 7.62 MB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
