University of Malta

Study-Unit Description
UOM Main Page
Apply - Admissions 2016
MSc in Language and Computation
BSc in Human Language Technology
Academic English Programme
Campus Map button


TITLE Data-Driven Natural Language Processing

LEVEL 03 - Years 2, 3, 4 in Modular Undergraduate Course


DEPARTMENT Institute of Linguistics and Language Technology

DESCRIPTION This study-unit focuses on techniques for designing Natural Language Processing applications whose core is a statistical model of language derived from large linguistic corpora. The unit is divided into three main parts, as follows:

1. Part I deals with introductory material and some of the mathematical and linguistic background. In this part, participants will also be introduced to existing corpora, as well as annotation methods.

2. Part II focuses in detail on particular areas of corpus-driven research in NLP, and the methods used, with particular emphasis on:
- Research on words, word distributions, word frequencies and collocations.
- Semantic similarity and corpus-derived thesauri.
- N-gram language models.
- Hidden Markov models.
- Maximum Entropy models.

3. Part III aims to provide a more comprehensive picture of state-of-the art NLP research, with emphasis on the areas such as the following:
- Part of speech tagging.
- Statistical Parsing.
- Statistical techniques for Natural Language Generation.

Study-unit Aims:

Contemporary NLP applications increasingly rely on statistical generalisations from data repositories to circumvent the knowledge bottleneck that has plagued AI systems since their inception. A knowledge of statistical methods and algorithms is therefore fundamental to any student of NLP. This unit aims to:
- Give students a thorough grounding in such methods.
- Show how these methods are deployed in the construction of systems for the robust analysis and generation of Natural Language.

Learning Outcomes:

1. Knowledge & Understanding:

By the end of the study-unit the student will be able to:
- Identify the appropriate statistical models for a particular NLP problem.
- Design and implement a system that incorporates data-driven learning of natural language patterns.

2. Skills:

By the end of the study-unit the student will be able to:
- Formulate a research problem in probabilistic terms.
- Formulate and test hypotheses.
- Find and exploit resources (including web resources).
- Implement solutions to NLP problems using data-driven methods based on large corpora and stochastic models.

Main Text/s and any supplementary readings:

- C. Manning and C. Schutze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
- D. Jurafsky and H Martin (2009). Speech and language processing (2nd Ed). New York: Prentice Hall.

ADDITIONAL NOTES Pre-requisite Qualifications: Knowledge of Discrete Mathematics and Basic Probability Theory

STUDY-UNIT TYPE Lecture and Practicum

Assessment Component/s Resit Availability Weighting
Examination (2 Hours) Yes 40%
Project Yes 60%

LECTURER/S Albert Gatt

The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the study-unit description above applies to the academic year 2017/8, if study-unit is available during this academic year, and may be subject to change in subsequent years.

Class timetables are now available from this page.

For study-units LIN1063, LIN1065, LIN2013 and LIN5063, please click on this page to check the Academic English timetable.


Log In back to UoM Homepage