Study-Unit Description

Study-Unit Description


TITLE Corpus Linguistics

UM LEVEL 03 - Years 2, 3, 4 in Modular Undergraduate Course



DEPARTMENT Institute of Linguistics and Language Technology

DESCRIPTION Learning Objectives:

Corpus Linguistics is the study of language through the use of corpora, i.e. large archives of linguistic data (such as texts and speech transcriptions). Over the past decades, Corpus Linguistics has emerged as an important paradigm in the study of languages, not only because it has helped to place linguistic theory on a sounder empirical footing, but also because it has challenged a number of standard assumptions that underpin some work in theoretical linguistics. This study-unit aims to introduce participants to the general field of Corpus Linguistics and its methods, and to a number of topics that have benefited from the study of linguistic corpora. In so doing, it will motivate them to assess critically some of the basic assumptions in this field, in comparison to those made in more theoretically-oriented work.

Content covered

The study-unit will consist of lectures and tutorials, the latter often involving students in hands-on practical work using existing corpora and corpus analysis tools.

Part I: Foundational Issues
1. What is a corpus? A brief history of corpus linguistics.
a. Corpus linguistics and generative grammar: a critical comparison of the basic tenets of grammatical theory, and how these have been challenged by the advent of corpus-based approaches to language.

2. Types of corpora (reference corpora, parallel corpora, web corpora, multilingual corpora, etc)
a. Specific examples of existing corpora (e.g. the British National Corpus, the Maltese Language Resource Server)

3. Issues in the construction and design of linguistic corpora.
a. Representativeness and corpus design.
b. Linguistic annotation of a corpus.

4. Conducting linguistic analysis using corpora
a. Basic statistical tools (frequencies and distributions)
b. An introduction to some useful software for corpus analysis.

Part II: Applications
5. Corpus-based lexicography
6. Semantics: what collocations and idioms in a corpus can tell us about meaning.
7. Corpora and grammar: the use of corpora to discover grammatical regularities, and to determine degrees of grammaticality of different constructions.
8. Corpora and stylistics: the use of electronic data to discover the determinants of a particular style (e.g. formal vs. informal) in language.

Reading List

- McEnery, T. and Wilson, A. (2001). Corpus Linguistics (2nd Ed). Edinburgh: Edinburgh University Press
- Biber, D., S. Conrad and R. Reppen. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press

In addition to the above, students will also be assigned readings which will be made available throughout the course.


Assessment Component/s Assessment Due Sept. Asst Session Weighting
Assignment SEM2 Yes 100%

LECTURER/S Stavros Assimakopoulos


The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the description above applies to study-units available during the academic year 2022/3. It may be subject to change in subsequent years.