Study-Unit Description

Study-Unit Description


CODE ICS2203

 
TITLE Statistical Natural Language Processing

 
UM LEVEL 02 - Years 2, 3 in Modular Undergraduate Course

 
MQF LEVEL 5

 
ECTS CREDITS 5

 
DEPARTMENT Artificial Intelligence

 
DESCRIPTION Natural Language Processing (NLP) is the study of the computational treatment of human languages with a particular focus on the interaction between computers and humans using our natural language. It is the driving force behind many applications, like virtual assistants, chatbots, sentiment analysis, automatic summarization, machine translation and more.

This study-unit introduces students to fundamental concepts, methods, and algorithms in NLP, with a focus on statistical and probabilistic approaches. It exposes students to the type of machine learning algorithms that are used to process, analyse and model both text and speech data. Students will learn to approach core NLP tasks using statistical techniques, thus developing an appropriate foundation to NLP. Key topics include:

- Introduction to Natural Language and Speech Processing, including core concepts
- Language Modelling using n-grams and probabilistic models
- Text classification using statistical approaches (e.g. Naïve Bayes, Logistic Regression)
- Part-of-Speech tagging and sequence labelling
- Named-Entity Recognition (NER) and chunking
- Distributional semantics and word representations
- Speech feature template matching
- Evaluation metrics for NLP systems.

Study-unit Aims:

The study-unit aims to:
- Introduce student to core fundamental concepts and challenges in NLP, focusing on statistical and probabilistic approaches;
- Provide students with a clear knowledge-base and understanding of language modelling, text classification, part-of-speech tagging;
- Equip students with practical programming skills and taking a problem-solving approach process language;
- Develop the ability to apply statistical models to solve real-world NLP problems and evaluating system performance using the appropriate metrics;
- Foster an awareness of the complexities involved in processing low-resource languages, particularly Maltese, within the broader context of multilingual NLP.

Learning Outcomes:

1. Knowledge & Understanding
By the end of the study-unit the student will be able to:

- Understand the core concepts and terminology in NLP, such as N-Gram models, Hidden Markov and Maximum Entropy Models, and more;
- Understand better the requirements of processing large volumes of data;
- Explain statistical techniques underlying fundamental NLP tasks, including language modelling, classification, and sequence labelling;
- Discuss evaluation metrics and their significance in assessing the performance of NLP systems;
- Recognise the specific challenges associated with processing low-resource languages such as Maltese;
- Explain the process behind speech signal analysis, and the statistical measures for pattern matching.

2. Skills
By the end of the study-unit the student will be able to:

- Process text and speech data for the purpose of linguistic annotation and processing using standard NLP techniques (e.g. tokenization, lemmatization, stemming);
- Apply statistical models to solve NLP tasks such as text classification, part-of-speech tagging, and named-entity recognition;
- Implement n-gram language models and evaluate their performance using metrics such as perplexity;
- Analyse the outputs of NLP models and interpret their performance;
- Be able to pattern-match and classify speech sounds to corresponding labels.

Main Text/s and any supplementary readings:

- D. Jurafsky and J.H. Martin (2025). Speech and Language Processing (3rd edition, draft). Available online.
- T. Mitchell (1998). Machine learning. McGraw Hill.
- Foundations of Statistical Natural Language Processing (1999) by Christopher D. Manning and Hinrich Schütze.
- S. Bird, E. Klein and E. Loper, Natural Language Processing with Python, O'Reilly, 2009.

 
STUDY-UNIT TYPE Blended Learning

 
METHOD OF ASSESSMENT
Assessment Component/s Assessment Due Sept. Asst Session Weighting
Project SEM1 Yes 100%

 
LECTURER/S Claudia Borg
Andrea De Marco

 

 
The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the description above applies to study-units available during the academic year 2025/6. It may be subject to change in subsequent years.

https://www.um.edu.mt/course/studyunit