Linguistics Circle occasional seminar series

The Institute of Linguistics organises an occasional series of seminars, where local and international speakers are invited to present the results of recent research which is of interest to both professional (computational) linguists and members of the broader public.
The seminars take place in an informal setting and their primary aim is to encourage discussion. Students and members of the public are welcome to attend!

Current Seminars (2017-2018):
I am always looking for speakers in the Linguistics Circle Seminar. Please contact Lonneke van der Plas if you have something to present.

Lonneke van der Plas, University of Malta

Title: Compounds, from analysis to generation

Date: Friday 15 September 2017 at 12:00 hrs

Venue: GW154

Compounds, such as 'bear country' or 'dry bag' are a frequent phenomenon with low token counts. For example, Baroni et al. (2002) show that for a corpus of German newswire almost half of the word types are compounds, while at the same time most individual examples of compounds are infrequent. Compounding is a productive word-formation process, which explains these statistics. Being abundant as a general process but scarce in terms of individual examples makes the analysis of compounds particularly problematic for statistical techniques that need high token frequencies to make accurate predictions. Since data sparsity is expected to lead to low performance, previous work in NLP has been concerned with addressing compound analysis compositionally by recursively analysing a compound’s immediate constituents.
In the first part of this talk, I will discuss compoundhood and inspect the relevance of various established linguistic criteria. I will show that languages show a lot of variation when it comes to compounding. Some languages, such as German, use closed compounding (i.e., they create one-word compounds e.g., Todesstrafe ‘death penalty’), whereas others do not. In Romance languages, such as French, compounds are not as productive, and instead complex nominals (e.g., peine de mort ‘death penalty’) are used.
Then, I will discuss work recently undertaken on the structural and semantic analysis of compounds, where we try to avoid using manually built resources, such as morphological analysers or annotated corpora and rely mostly on naturally occurring supervision (Snyder and Barzilay, 2010) acquired from parallel corpora. For example, we exploit closed compounding in Germanic languages to identify compounds in English parallel text. 
In the last part of my talk, I will shift the focus from analysis to generation and discuss the potential of compounds as vehicles for creative thought. Compounding allows us to combine lexemes and create novel concepts in a highly flexible manner. The relations between the components of a compound are covert and people are very creative when it comes to describing the relation between these components even when the relevant lexemes are randomly combined. Since in contrast to compound analysis, novel compound generation has received very little attention in the NLP community, I will survey previous work on novel compound interpretation from the psycholinguistics literature and show some first results from our ongoing work.

M. Baroni, J. Matiasek, and H. Trost. 2002. Predicting the Components of German Nominal Compounds. In Proceedings of ECAI.

B. Snyder,  R. Barzilay. 2010. Climbing the Tower of Babel: Unsupervised Multilingual Learning. In ICML.

Shiloh Drake, University of Arizona

Title: L1 biases in learning root-and-pattern morphology

Date: Thursday 19th October 2017 at 15:00 hrs

Venue: OH112

This work explores how native language biases affect how root-and-pattern morphology is learned in the lab. While researchers have shown that learning non-adjacent dependencies, such as those found in vowel harmony, root-and-pattern morphology, or in verb agreement, are difficult for English and French native speakers to track and learn (e.g., Gómez, 2002; Newport & Aslin, 2004; Bonatti et al., 2005), other work shows that if a speaker’s native language requires them to attend to non-adjacent dependencies, they will be able to learn an artificial grammar that employs analogous non-adjacent dependencies (LaCross, 2011, 2015). To this end, my work uses participants from four speaker groups, each with different amounts and types of exposure to root-and-pattern morphology, to see whether performance on the artificial grammar is modulated by exposure.

Preliminary results from this research show that an artificial grammar employing root-and-pattern morphology is more difficult for native English speakers to learn than an artificial grammar employing concatenative morphology (similar to work by LaCross (2015) and Newport & Aslin (2004)). While English speakers must track other types of linguistic non-adjacent dependencies (e.g., syntactic non-adjacent dependencies, such as verb agreement or auxiliary agreement), they do not generally have to track phonological non-adjacent dependencies (such as vowel harmony) or morphological non-adjacent dependencies (such as roots and patterns). Arabic speakers more accurately learn the non-concatenative grammar, and are able to pick up on more fine-grained patterns than the English speakers are in the Wug Test.

Jessica Nieder, University of Dusseldorf

Title: Investigating the plural formation in Maltese with naive discriminative learning

Date: Wednesday 1st November 2017 at 12:00 hrs

Venue: FEMA419

The complexity of the plural formation in Maltese raises the question as to what Maltese native speakers use as a basis to set up singular-plural mappings. We hypothesize that the phonotactics of the singular determines the choice of the plural form. To explore this, we conducted a corpus study, a production experiment and used several NDL (Baayen et al., 2011) cue implementations.
A corpus of 2373 Maltese singular-plural pairs compiled from two different sources (Schembri, 2012; Gatt and Čéplö, 2013) served as basis for a production experiment in which we asked native speakers to produce plural forms for existing and phonotactically legal nonce singular nouns. Nonce forms were constructed based on words of our corpus by changing either the consonants or the vowels or both systematically. The nouns used as a base for the changes had either a broken plural form, a sound plural form or both plural forms. The results show that Participants produced significantly more sound plural forms for nonces based on sound plural words and significantly more broken plurals for nonces based on broken plural words.

We modeled our experimental results with the Naive Discriminative Learner, a computational model of morphological processing (Baayen et al., 2011). The NDL model was trained on the nouns of our corpus. We then fitted the model to our experimental data to see if NDL would classify our nonces and existing words in the same way the participants of the production experiment did. A NDL model using bigrams as cues showed an acceptable prediction for both plural types. An abstraction of consonants and vowels into “C” and “V” in the model led to worse performances, which supports our assumption that consonant and vowel identity is important for the generalizations of Maltese plurals. Moreover, we conclude that the phonotactics of the singular determines the plural form as bigrams correspond to the syllable structure of Maltese nouns.


Jiří Dvořák, University of Defence Language Centre, Brno.

Title: Crisis Management Corpus Terminology Study. (Why a hazard need not be a threat, a threat need not result in a high risk and a risk need not cause any crisis).

Date: Wednesday 8th November 2017 at 12:00 hrs 

Venue: DGZ101

Such a complex process like crisis management (CM) can only be discussed and understood with success if there is agreement on the meanings of its essential terminology. New terminology used in special publications on crisis management and risk management reflects a rapid progress in these areas. Significant changes have also been taking place in the theory and practice of lexicology, as a result of new technology for data-processing and text-based research. Dictionaries are founded on authentic usages of words, their collocations and semantic domains they are associated with. However, large, complex units of meaning are often absent from dictionaries and terminological glossaries. Therefore, the findings of crisis management corpus terminology study could be met with positive response, both among linguistic and management specialists. The presenter will introduce the concepts of selected CM terms, i.e. hazard, threat, emergency, risk, and crisis, with the areas to which the terms are most frequently related. Data collected in a computerized crisis management corpus and processed with the help of Sketch Engine, a web‑based program, will be presented as well.



Georgios N. Yannakakis, Institute of Digital Games,University of Malta

Title: AI designing games for us with (or without) us

Date: Wednesday 6th December 2017 at 12:00 hrs

Venue: GW164

Can computational processes and machine-crafted artifacts be considered creative? When does this happen and who judges after all? What happens when we create together with a creative machine? Do we merely create together or can a machine truly foster our creativity as human creators? When does such co-creation foster the creativity of both humans and machines? In this talk I will address the above questions by positioning computer games as the ideal application domain for artificial intelligence for the unique features they offer. Computational game creativity is placed at the intersection of developing fields within games research and long-studied fields within AI such as computational art and interactive narrative. Advanced methods for procedural content generation and AI-assisted game design will be showcased via a plethora of projects running currently at the Institute of Digital Games.

Past seminars
(Seminars from previous years: 2015-2016, 2014-2015, 2013-2014, 2012-2013, 2011-2012, 2010-2011)

Adam Ussishkin, University of Arizona

Title: What can auditory masked priming tell us about the role of morphology in auditory word recognition?

Date: Wednesday 8 March 2017 at 13:00 hrs

Venue: DGZ108

Words consist of a phoneme or letter sequence that maps onto meaning. Most prominent theories of word recognition (auditory and visual) portray the recognition process as a connection between these small units and a semantic level. However, there is a growing body of evidence suggesting in the priming literature that there is an additional, morphological level that mediates the recognition process. In morphologically linear languages like English, however, morphemes and letter or sound sequences are co-extensive, so the source of priming effects between related words could be due to simple phonological overlap as opposed to morphological overlap. In Semitic languages, however, the non-linear morphological structure of words reduces this confound, since the morphemes are interdigitated in a non-linear fashion. Semitic words are typically composed of a discontiguous root (made up of three consonants) embedded in a word pattern specifying the vowels and the ordering between consonants and vowels. Active-passive pairs of Semitic verbs in Maltese illustrate this relationship (the root is underlined); e.g., fetaħ ‘open’-miftuħ ‘opened’.
In this talk, I report on a number of experiments our lab has carried out in Maltese and Hebrew investigating the extent to which the non-linear morphemes used in Semitic facilitate auditory word recognition, and to what extent potential priming effects are independent of the phonological overlap typically inherent in morphological relationships. These experiments make use of the auditory masked priming technique (Kouider and Dupoux, 2005). I show that not only do roots facilitate auditory word recognition in these languages, but that these morphological effects are independent of phonological overlap effects.

Andy Wedel, University of Arizona

Title: Signal evolution within the word

Date: Friday 10 March 2017 at 13:00 hrs

Venue: GW214

Languages appear to optimize the amount of segmental information allocated to words: words that are are less contextually predictable tend to be longer, while words that are more predictable tend to be shorter (Zipf 1935, Piantadosi et al 2011). In this talk I'll show evidence from English that the average informativity of segments inside of words is also optimized. 

Listeners incrementally process words as they are heard, progressively updating inferences about what word is intended as the phonetic signal unfolds in time. As a consequence, phonetic cues at the beginning of a word are more informative about word-identity, because they are less predicted by previous segmental context. This suggests that languages should not only optimize the amount of segmental information in words, but also optimize the distribution of that information across the word. Specifically, words that are on average less predictable in context should contain more highly-informative segments overall, and also position more informative segments earlier in the word. 

More generally, we know that languages show a strong tendency to develop phonological patterns which enhance phonetic cues at word beginnings, while reducing cues later in words. I will argue that this typological tendency plausibly arises from the word-level phenomena described here.

Dominique Lagorgette, Université de Savoie-Mont Blanc

Title: French medieval corpora : including the diatopic and diastratic variation in French medieval linguistic studies - a methodologic approach

Date: Wednesday 15 March 2017 at 12:00 hrs

Venue: DGZ108

Medieval French studies developed linguistic approaches over the last century mostly in studying literary texts. Considering the rarity of data in Early Old French (9th-11th C.) and then the primacy of courtly literary texts until the 13th-14th C., a history of speech registers (diastratic variation) still remains to be written: how do we know if a word was felt as vulgar, acceptable or taboo ? How do we measure the pragmatic force of  transgressive speech ? This paper will question the methodology necessary to such a research, i.e. the relevant corpora, theoretical frame and treatment of data.

Roman Klinger, University of Stuttgart

Title: An Empirical, Quantitative Analysis of Differences between Sarcasm and Irony

Date: Friday 17 March 2017 at 12:00 hrs

Venue: GW214

A variety of classification approaches for the detection of ironic or sarcastic messages has been proposed in the last decade to improve sentiment classification. However, despite the availability of psychologically and linguistically motivated theories regarding the difference between irony and sarcasm, these typically do not carry over to a use in predictive models; one reason might be that these concepts are often considered very similar.
In this presentation, I discuss a data-driven, empirical analysis of Tweets and how authors label them as irony or sarcasm. The experiments suggest that authors do not use these concepts interchangeably and that differences can be detected with automatic methods with a surprisingly high accuracy.

Uwe Reyle, University of Stuttgart

Title: Perfect Tense Forms in English, German and French

Date: Wednesday 29 March 2017 at 12:00 hrs

Venue: GW156

Most of the literature on English tense and aspect (and therewith of tense and aspect simpliciter)  is on the English Present Perfect. The facts that makes this form so special and for the theorist outstanding almost as a sore thumb are easy to describe, but to come up with a theoretically satisfactory account of it is much more involved. We will present a result-state based, compositional analysis that accounts for the peculiarities of the English Present Perfect and explains why this form is different from other perfect forms in English on the one hand and its counterpart in Geman and French on the other.

Maris Camilleri, University of Vienna

Title: Relative clause types in Maltese

Date: Friday 7 April 2017 at 12:00 hrs

Venue: GW214

This talk will take us on a descriptive tour across a number of different types of relative clauses available in Maltese, and a number of strategies which the language makes use of, in order to syntactically construct such clause types. We will specifically highlight the split which exists, as well as the lack of it, between complementiser li vs. wh-pronoun strategies, and amongst other things illustrate how a type of wh-pronoun relative clause can permit internally-headed antecedents, and that counter to claims in the literature (predominantly Caponigro (2003) and subsequent publications), we demonstrate how Maltese free relative clauses are not constrained to the wh-pronoun strategy. Rather, definite free relative clauses can also be introduced by the complementiser strategy.

Holger Mitterer, University of Malta

Title: Production and Perception of Maltese root consonants

Date: Friday 12 May 2017 at 12:00 hrs

Venue: GW214

Maltese, as an originally Semitic language, uses verbs based on tri-consonantal roots. In this talk, I will focus on two challenges these provide in speech production and perception. First, the three consonants are in sequence in the present tense plural leading to articulatory difficult clusters (e.g., k-t-b, Engl., to write, jiktbu, Engl., they write), and secondly, the middle root consonant is geminated to express a causative (e.g., w-q-f, Engl., to stop, waqaf quddiem il-hanut, Engl., he stopped in front of the store, waqqaf il-karrozza, he stopped the car).  Such forms were elicited in a sentence-guessing task with a picture prime (to avoid reading in a production task) and analysed using forced alignment. The results showed that root consonants are quite resilient against reduction/deletion, and even leading to vowel transpositions, putatively to prevent reduction (se jibdlu → sejbidlu, Engl., they will change). For the singleton-geminate distinction, the results show that, next to duration, especially laryngeals geminates have additional cues that cannot be easily explained as a consequence of the increased prosodic weight of geminates. Perception experiments show that listeners strongly rely on these cues. This provides additional evidence that phonological features are unlikely to be involved in prelexical speech processing (cf. Mitterer, Kim, Cho, 2016; Reinisch & Mitterer, 2016), because the realization of [+LONG] depends on place of articulation. Finally, I will present data that the singleton-geminate distinction is rate-dependent in both perception and production, contrasting with recent views that rate-dependencies may not be pervasive in speech processing.

Catherine Pelachaud,  LTCI, TELECOM ParisTech

Title: Modeling conversational nonverbal behaviors for virtual characters

Date: Wednesday 26th of October at 12:00 hrs 

Venue: GWHD1

In this talk I will present our on-going effort to model virtual character with nonverbal capacities.
We have been developing Greta, an interactive Embodied Conversational Agent ECA platform. It is endowed with socio-emotional and communicative behaviors. Through its behaviors, the agent can sustain a conversation as well as show various attitudes and levels of engagement.

The ECA is able to display a large variety of multimodal behaviors to convey communicative intentions. We rely on a lexicon that contains entries defined as multimodal signals temporally coordinated. At run time, the signals for given communicative intentions and emotions are instantiated and their animations realized. Communicative behaviors are not produced in isolation from one another. We have developed models that generate sequences of behaviors; that is behaviors are not instantiated individually but the surroundings behaviors are taken into account.


