Study-Unit Description

Study-Unit Description


TITLE Big Data Processing

LEVEL 05 - Postgraduate Modular Diploma or Degree Course


DEPARTMENT Artificial Intelligence

DESCRIPTION This unit presents students with an in-depth study of scalable solutions to manage, process and analyse Big Data on servers, clusters of computers or on the cloud. In particular students should be exposed to different Big Data computing approaches, trends and technologies such as Apache Hadoop based on the MapReduce scalable computing approach and NoSQL technologies (graph databases), Hive data warehouse system, Pig Latin for productively creating large scale data applications.

Study-unit Aims:

Through this study-unit, students will be given the opportunity to:
- understand the practical application of Big Data processing tools and techniques;
- learn how to use a range of modelling and big data analytical techniques;
- evaluate the advantages and limitations of different technologies related to big data processing;
- learn and work both independently and within groups;
- develop balance between theoretical and practical skills.

Learning Outcomes:

1. Knowledge & Understanding:
By the end of the study-unit the student will be able to:

- gain awareness of major Big Data use cases in science and industry and the associated Big Data challenges;
- critically understand the data structures used in the context of Big Data;
- identify and understand the principles and functionalities of Big Data programming models and tools;
- acquire, process and manage large heterogeneous data collections;
- develop algorithms and systems for information and knowledge extraction from large data collections.

2. Skills:
By the end of the study-unit the student will be able to:

- employ current technologies for efficiently processing massive amounts of data (Big Data);
- design scalable solutions for Big Data;
- design effective techniques to combine data from structured and unstructured data sources.

Main Text/s and any supplementary readings:

Mining of Massive Datasets, Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press, 2014.
Hadoop in Practice, Alex Holmes (Manning 2012)
Hadoop: the Definitive Guide (2nd Edition), Tom White (O'Reilly 2011)
Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer (Morgan and Claypool 2010)



Assessment Component/s Resit Availability Weighting
Project Yes 100%

LECTURER/S Charles Abela (Co-ord.)
Joseph Bonello
Jean Paul Ebejer

The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the study-unit description above applies to the academic year 2017/8, if study-unit is available during this academic year, and may be subject to change in subsequent years.