Study-Unit Description

CODE

CCE5108

TITLE

Data Science in Python

UM LEVEL

05 - Postgraduate Modular Diploma or Degree Course

MQF LEVEL

ECTS CREDITS

DEPARTMENT

Communications and Computer Engineering

DESCRIPTION

This study-unit is on how to use Python to analyze and visualize data and build machine learning models. Knowledge of some prior experience in programming and theoretical knowledge of data analysis tools and machine learning is assumed. The study-unit starts with a review of programming in Python, covering environments and functionalities including the use of Jupyter Notebooks and follows with in depth coverage of python libraries and modules that are often used in data science, such as pandas for data manipulation and processing, numpy and scipy for data analysis and representation, matplotlib for data visualisation and scikit-learn, tensorflow and keras for building and deploying predictive machine learning models.

Study-Unit Aims:

- To comprehensively teach how to use readily available toolkits in Python when carrying out various data science tasks, such as data preparation, visualisation and predictive model development;
- To recall and discuss, where and when appropriate, data analysis tools and machine learning algorithms (covered in other study units that are theoretical in nature);
- To provide students with an opportunity to gain invaluable practice with the methods and toolkits (libraries), in a number of real-world problem settings via programming assignments.

Learning Outcomes:

1. Knowledge & Understanding:
By the end of the study-unit the student will be able to:

- Describe Jupyter notebooks and virtual environments;
- Explain DataFrames in Pandas and how the data structures are indexed;
- Describe the various array data structures and associated functions in numpy;
- Explain the data structures in tensorflow and keras;
- Describe the data science and machine learning pipelines avialble in sci-kit-learn;
- Describe the most commonly used functions in matplotlib for data visualisation;
- Describe the most commonly used data-analysis tools in scikit-learn and scipy;
- Review data manipulation tools in numpy;
- Review data manipulation tools in scikit-learn;
- Review and describe various machine learning models in scikit-learn;
- Understand how to specify and train models in tensorflow and keras;
- Describe how a predictive model is deployed in production.

2. Skills:
By the end of the study-unit the student will be able to:

- Program Python Code in Jupyter notebooks and virtual environments;
- Read in data to Pandas DataFrames and carry out basic operations;
- Query and merge Pandas DataFrames;
- Group and manipulate data in Pandas;
- Generate summary tables in Pandas and visualise data using built-in functions;
- Perform data cleaning and data imputation using readily available functions;
- Write code to visualise data and relationships among features in matplotlib and in seaborn;
- Prepare data for training predictive models using scikit-learn libraries;
- normalise data; Bin and sample data;
- compute dataset distribution; shuffle and split dataset;
- Train and evaluate models using the scikit-learn libraries; k-nearest neighbours (kNN);
- Linear and logistic regression;
- Decision trees, Random Forest and Support Vector Machine;
- Optimise hyperparameters (k-fold cross validation) in scikit-learn;
- Design an evaluation experiment; choose the appropriate metrics;
- compare models in terms of automatic performance metrics;
- Carry out feature selection in sci-kit learn using filters and greedy algorithms;
- Transform data to lower dimensions using PCA, LDA and t-SNE in scikit-learn;
- Specify neural network models (feed-forward, recurrent and gated architectures) in tensorflow and keras;
- Select the appropriate functions for the given examples;
- activation functions; loss functions; optimiser;
- Store training meta-data and visualise using matplotlib for model debugging;
- Use Keras functions to deal with missing data, categorical data and model evaluation;
- Use Keras functions to load data on the fly during model training;
- Use GPUs during the training and deployment of predictive models;
- Deploy a python model in production using Docker and fastAPI.

Main Text/s and any supplementary readings:

- "Python Data Science Handbook: Essential Tools for Working with Data", 1st Edition, Jake VanderPlas (Author), O’Reilly, available at https://jakevdp.github.io/PythonDataScienceHandbook/.
- “Building Machine Learning Systems with Python”, 3rd Edition, Luis Pedro Coelho, Wilhelm Richert, Matthieu Brucher (Authors), Packt Publishing, ISBN: 9781788623223.
- “Deep Learning with Python”, 2nd Edition, Francois Chollet (Author), Manning, ISBN-13: 978-1617296864.

STUDY-UNIT TYPE

Lecture and Independent Study

METHOD OF ASSESSMENT

Assessment Component/s	Sept. Asst Session	Weighting
Assignment	Yes	25%
Assignment	Yes	25%
Assignment	Yes	25%
Assignment	Yes	25%

LECTURER/S

Fabian Micallef

The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the description above applies to study-units available during the academic year 2025/6. It may be subject to change in subsequent years.

Study-Unit Description

Study-Unit Description

Study at UM