University of Malta
 

Study-Unit Description
UOM Main Page
 
 
 
Apply - Admissions 2016
Newspoint
Campus Map button
Facebook
Twitter


CODE CIS5113

 
TITLE Large Scale Databases

 
LEVEL 05 - Postgraduate Modular Diploma or Degree Course

 
ECTS CREDITS 5

 
DEPARTMENT Computer Information Systems

 
DESCRIPTION This unit focuses on current research topics in databases, data modelling for consolidation and presentation of an orginisation’s data infrastructure. Consequent to this part of the content is the scaling of data processing operations under varying data consistency requirements and conversely starting with operational targets and indicating an acceptable (or possible) level of operational support.

Given that a number of databases and datasets are accessible to an organisation then it is in a position to consolidate sources together so as to provide "a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions" (B. Inmon).

Also combining a company’s data with data streaming from various sources and structures creates the possibility for it to investigate and follow-up opportunities that arise from day to day.

This unit presents knowledge and know how on building repositories onto which data warehousing and data mining exercises are executable. Handling of large data sets - origin of which can be transactional systems or pattern extraction programs (e.g. data extraction from large repositories).

Design and implementation techniques in SQL and procedural extensions to SQL are presented.

Furthermore specialise tools and techniques are studied to consolidate and validate the quality of data. It is now accepted that a portion of this processing is farmed to less general purpose DBMSs in a direct effort to reach performance targets.

A substantial part is devoted to query design and optimisation for these massive data repositories.

Study-unit Aims:

The aims of this study-unit are to:
- instill techniques of how to identify, understand the underlying databases (and the processes executed over them);
- introduce methodologies on how to move data from a source to a destination, and then integrate it into a centralised repository. This centralised database needs to adhere to its own set of integrity constraints and gives the capability of tracing back data to its source;
- pursue further knowledge to the physical design of a DB by including hardware and design techniques (e.g. what, how, when, where to index) that are very different from on-line systems;
- apply data warehousing and data mining techniques that require extensive computational load if executed over massive datasets. Such queries/algorithms require careful study to design and optimise for execution. It has become customary that a number of specific techniques are applied to known problems;
- allow students to consider data intensive distributed computing for both monolithic applications and object oriented applications. In case of object oriented applications interoperability of objects running on different computer platforms and developed by different languages are the main issues. Another crop of tools to be introduced are the current NoSQL DBMS that offer added performance if set-up is acceptable (e.g. does not affect business process).

Learning Outcomes:

1. Knowledge & Understanding:
By the end of the study-unit the student will be able to:

Recognise the need of and know how to build a cross organisation data infrastructure for a warehouse and data mining exercise;
Evaluate data sources and how to extract and move data into a staging area;
Build an organisation wide data repository for data warehousing and data mining (at logical and physical level);
Write complex queries in SQL and SQL procedural extensions;
Write complex queries in NoSQL and NoSQL procedural extensions;
Understand the concept of distributed data and functions across networks of computers;
Build a framework that enables distributed computation across databases and massive datasets;
Explain the difference between building the infrastructure and querying it in terms of computational load;
Explain query processing and optimisation in massive datasets.

2. Skills:
By the end of the study-unit the student will be able to:

Create large scale distributed and interoperable systems;
Write and implement complex database design for an enterprise infrastructure with a database high level language;
Write and implement problematic extract, load and transform methods to consolidate the source databases into the infrastructure;
Write and implement extract, load and transform methods to read the output of pattern extraction programs;
Write SQL commands for roll-up (and cube), top-n, group by, partitions and CTE;
Write procedures with embedded queries for basic algorithms that extract patterns;
Write code for specific data intensive problems: e.g. association rules, rules, clustering;
Write code for dimension reduction for data intensive problems and datasets;
Write code to implement data mining in time series datasets;
Select, use, and deploy specialised tools for data warehousing and data mining.

Main Text/s and any supplementary readings:

• Fundamentals of Database Systems, Ramez Elmasri, Shamkant B. Navathe, 6th Edition, 2010, Addison Wesley, ISBN-13: 978-0136086208.
• Data Mining: Concepts and Techniques, Jiawei Han, Micheline Kamber, Jian Pei, 3rd Edition, 2011,The Morgan Kaufmann Series in Data Management Systems), ISBN-13: 978-0123814791.
• Data Warehouse Design: Modern Principles and Methodologies, Matteo Golfarelli, Stefano Rizzi, 2011, McGraw-Hill Osborne, ISBN-13: 978-0071610391.
• M.T. Ozsu, P.V. Valduriez., Principles of Distributed Databases, 2011, Springer - PH Publishers, ISBN: 978-1-4419-8833-1.
• A number of research papers are made available.
• System Manuals as per need (and available in department's labs)

Note: Inmon and Kimball books are still a good read for data warehousing.

 
RULES/CONDITIONS Before TAKING THIS STUDY-UNIT YOU ARE ADVISED TO TAKE CIS2090 OR TAKE CIS3107

 
STUDY-UNIT TYPE Lecture, Independent Study and Practical

 
METHOD OF ASSESSMENT
Assessment Component/s Resit Availability Weighting
Practical Yes 30%
Examination (2 Hours) Yes 70%

 
LECTURER/S Joseph Vella

 
The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the study-unit description above applies to the academic year 2017/8, if study-unit is available during this academic year, and may be subject to change in subsequent years.
Calendar
Notices
Study-unit Registration Forms 2017/8

Register

For Undergraduate (Day) and Postgraduate students.

 

Faculty of ICT Timetables

Timetables

ICT Timetables are available from Here.

Health and Safety Regulations for Laboratories Form

The Faculty of ICT Health and Safety Regulations for Laboratories form can be found here

 HealthAndSafety

13th Edition of EY’s Annual Attractiveness Event

 Logo

 

 

The 13th Edition of EY’s Annual Attractiveness event will be held on 25th October 2017 at the InterContinental Hotel,

St. Julians. It is titled "Thinking without the box: disruption, technology and FDI".

 

The  students' invitation and more information can be found here

The conference programme can be found here

 

 
 

Log In back to UoM Homepage