Title: On ''seeing'' what we mean: Questioning the grounding capabilities of large-scale Vision-Language models
Date: Wednesday, 16 December 2020
Time: Noon - 13:00
Venue: Remotely; register online.
Host: Prof. Albert Gatt - University of Malta
Grounding symbolic representations in non-symbolic data is an old problem in Artificial Intelligence. Over the past few years, advances in Computer Vision and Natural Language Processing (NLP), both relying heavily on neural methods, have made it possible to develop models to ground language in visual information, at a scale that was unprecedented until recently.
But do such models really succeed at grounding symbolic expressions, such as words and phrases, in perceptual information (for example, visual features)?
In this talk, I will first survey the landscape of Vision and Language tasks in NLP, then focus on two related questions:
The results of this work suggest that there is substantial ground still to cover, before we can truly claim that language is being effectively linked to perceptual data.