Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/141889
Title: Comparing linguistic and visuo-linguistic representations for noun-noun compound relation classification in English
Authors: Lang, Inga (2021)
Keywords: English language
Linguistics
Grammar, Comparative and general -- Noun
Multimodal user interfaces (Computer systems)
Issue Date: 2021
Citation: Lang, I. (2021). Comparing linguistic and visuo-linguistic representations for noun-noun compound relation classification in English (Master's dissertation).
Abstract: Noun-noun compounds (NNCs), such as ‘restaurant owner’ and ‘city morgue’, are very frequent in the English language, and new ones are created regularly due to the high productivity of compounding as a word formation process. To fully understand the meaning of an NNC, we need to not only know the meaning of its parts, but also deduce the implicit semantic relationship between them. That is, we need to understand whether ‘city morgue’ means ‘a morgue made of cities’, ‘a morgue located in a city’ or something else entirely. Humans have clear intuitions about what relations can hold between the constituents of an NNC, but interpreting NNCs in a computational setting is a challenge. Accurate NNC processing is crucial for the advancement of many natural language processing tasks, including machine translation, text summarization, and natural language inference. Previous methods of computational NNC interpretation have been limited to approaches involving textual representations and linguistic features. However, research from both cognitive science and natural language processing suggests that grounding linguistic representations in vision or other modalities can increase performance on this and other tasks. Backed up by findings about human conceptual combination as well as theories of symbol grounding, our work is a novel comparison of linguistic and visuo-linguistic representations for the task of NNC interpretation. We frame NNC interpretation as a relation classification task, evaluating our approaches on a large annotated NNC dataset, with over 19,000 relationally-annotated compounds (Tratz, 2011). We employ two lines of experiments; one line explores the use of word2vec (Mikolov et al., 2013a) embeddings, compositionally combined into NNC representations in various ways, as inputs to an SVM classifier. The other line utilizes a BERT model, fine-tuned with a classifier layer on top. In both settings, we experiment with combining the textual representations with visual feature vectors obtained with a ResNet (He et al., 2016) model on images from ImageNet (Deng et al., 2009). We find that adding visual features increases performance on almost all data configurations in our SVM experiments, and that the results are statistically significant in some cases. In our BERT experiments, we find that BERT performs well on coarse-grained test data that may include previously seen constituents, but performs poorly on all other data configurations. However, adding raw ResNet feature vectors does increase BERT’s performance on the remaining settings, while normalized ResNet feature vectors contribute to little or no increase in performance. Our findings suggest that a visually grounded approach to NNC interpretation is a promising venture, and we view our novel approach as an encouraging starting point for more investigations into multimodal NNC processing.
Description: M.Sc. (HLST)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/141889
Appears in Collections:Dissertations - FacICT - 2021
Dissertations - FacICTAI - 2021
Dissertations - FacSoW - 2024

Files in This Item:
File Description SizeFormat 
2118ICTCSA531005071758_1.PDF
  Restricted Access
5.37 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.