Study-Unit Description

Study-Unit Description


CODE LIN3012

 
TITLE Data-Driven Natural Language Processing

 
UM LEVEL 03 - Years 2, 3, 4 in Modular Undergraduate Course

 
MQF LEVEL 6

 
ECTS CREDITS 5

 
DEPARTMENT Institute of Linguistics and Language Technology

 
DESCRIPTION This unit focuses on techniques for designing Natural Language Processing applications whose core is a statistical model of language derived from large linguistic corpora using machine-learning techniques. The emphasis of the unit is therefore on the nature of these techniques and on how they lend themselves to solving particular challenges in the analysis and generation of language.

The unit covers the following topics:
1. Fundamentals of machine learning and types of machine learning problems;
2. Supervised classification techniques for Natural Language, with a focus on:
(a) Naive Bayes classification;
(b) Logistic Regression;
(c) Feedforward neural networks for supervised classification.
3. Representation Learning for natural language, specifically the acquisition of distributional semantic models and dense word embeddings;
4. Sequence modelling for word sequences and grammar:
(a) n-gram models with backoff and smoothing techniques;
(b) neural language models;
(c) recurrent neural networks.

The above four core topics are in turn applied to a variety of natural language analysis and generation tasks, including, but not limited to:
- Sentiment analysis;
- Word sense disambiguation;
- Semantic modelling and bias detection;
- Part of Speech tagging;
- Text Generation.

In addition to weekly lectures, students are also given supervised practical sessions, where they are introduced to techniques and tools to implement the above solutions.

Study-unit Aims:

Contemporary NLP applications increasingly rely on statistical generalisations from data repositories to circumvent the knowledge bottleneck that has plagued AI systems since their inception. A knowledge of statistical methods and algorithms is therefore fundamental to any student of NLP.

This unit aims to:
- give students a thorough grounding in such methods;
- show how these methods are deployed in the construction of systems for the robust analysis and generation of Natural Language;
- pave the way for other units in speech processing, information extraction, and multilingual computing.

Learning Outcomes

1. Knowledge & Understanding
By the end of the study-unit the student will be able to:

- identify the appropriate statistical models for a particular NLP problem;
- design and implement a system that incorporates data-driven learning of natural language patterns;
- evaluate such a system using appropriate evaluation methods.

2. Skills
By the end of the study-unit the student will be able to:

- formulate a research problem in probabilistic terms;
- formulate and test hypotheses;
- find and exploit resources (including web resources);
- implement a machine-learning experiment to address a specific NLP problem.

Main Text/s and any supplementary readings:

Main text

- D. Jurafsky and H Martin (2009). Speech and language processing (2nd Ed). New York: Prentice Hall
(A third edition is in progress. The current version is available online at https://web.stanford.edu/~jurafsky/slp3/)

Supplementary readings for deeper understanding of machine learning and practical issues

- I. Goodfellow, Y. Bengio and A. Courville. (2016). Deep learning. Cambridge, MA: MIT Press
See also the companion website: https://www.deeplearningbook.org
- A. Geron (2017). Hands-on machine learning with Scikit-Learn and Tensorflow. Sebastopol, CA: O'Reilly

An older text, which still has useful insights

- C. Manning and C. Schutze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press

 
ADDITIONAL NOTES Pre-requisite Qualifications: Knowledge of Discrete Mathematics and Basic Probability Theory

Students who attend this unit must have a good background in statistics and probability, as well as be able to program in Python.

 
STUDY-UNIT TYPE Lecture and Practicum

 
METHOD OF ASSESSMENT
Assessment Component/s Sept. Asst Session Weighting
Examination (2 Hours) Yes 40%
Project Yes 60%

 
LECTURER/S Albert Gatt

 

 
The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the description above applies to study-units available during the academic year 2023/4. It may be subject to change in subsequent years.

https://www.um.edu.mt/course/studyunit