Title: Semantic similarity and the generation of referring expressions : a first report
Authors: Gatt, Albert
van Deemter, Kees
Keywords: Semantic Web
Issue Date: 2005
Publisher: Association for Computational Linguistics
Citation: Gatt, A., & van Deemter, K. (2005). Semantic similarity and the generation of referring expressions: a first report. 6th International Workshop on Computational Semantics,Tilburg. 1-5.
Abstract: The past decade, has witnessed renewed interest in the Generation of Referring Expressions (GRE) [23, 24, 8, 9, 10, 12, 22]. Broadening the scope beyond earlier work [3, 4, 5], recent proposals involve algorithms that refer to sets as well as individuals, using operations such as set union (‘the cat and the dogs’) and complementation (‘the dog that is not black’). As a consequence, it has become more difficult for a generator to choose among alternative expres- sions that may be coextensive. This paper is part of a concerted effort to shed some empirical light on the question of expressive choice. The focus is on reference to sets, where a referring expression is built by unifying two or more singletons. Starting with descriptions of the form ‘the N1 and (the) N2’, we investigate whether the semantic similarity of N1 and N2 is relevant in determining the acceptability of the generated NP. Suppose that, in a given domain, an entity e1 can be referred to as either ‘the postgraduate’ or ‘the psychologist’; similarly, e2 can be referred to as either ‘the undergraduate’ or ‘the man on the first floor’. Various alternatives exist for an expression referring to {e1, e2}, e.g.: (i) ‘the postgraduate and the man on the first floor’, (ii) ‘the postgraduate and the undergraduate’, (iii) ‘the psychologist and the undergraduate’. Here, (ii) is arguably better than (i) or (iii). Intu- itively, this is because the conjuncts in (ii) are more semantically similar or ‘related’. Moreover, expression (iii) violates the Gricean maxims. The choice of two equally specific [2] but seman- tically unrelated descriptors, ‘psychologist’ for e1 versus ‘undergraduate’ for e2, might give rise to (false) implicatures, such as that the two entities have nothing in common, thus violating the Gricean Cooperative Principle, and resulting in a description which is less coherent than it might be. Suppose further that e1 and e2, as well as a third entity e3 referred to as ‘the book’, were introduced in a discourse. Subsequent reference to a pair of these entities might be made via a coordinate construction, or some other structure. Considerations of semantic similarity may guide the choice between alternatives; in particular, referring to the set {e1, e2} using an NP conjunction is more felicitous than a similar reference to {e1, e3} (‘the psychologist and the book’). In the latter case, it may be more felicitous to refer to these two entities using different phrases. A third consideration has to do with a user’s comprehension of a generated text. If a description gave rise to false implicatures, or simply sounded odd as a result of an infelicitous choice of descriptors, the quality of the text and its comprehensibility would be reduced. We next describe a correlational study which investigated the relationship of semantic similarity and perceived acceptability of conjoined NPs. Our study is closely related in spirit to [13], which also evinces a concern with semantic plausibility and its implications for NLG, albeit in a different domain.
