Study-Unit Description

Study-Unit Description


TITLE Statistics for Data Scientists

UM LEVEL 05 - Postgraduate Modular Diploma or Degree Course



DEPARTMENT Artificial Intelligence

DESCRIPTION In this study-unit the student will be exposed to real-life data analysis. The focus is the collection, cleaning, visualization of data for analysis and model building purposes using the R Programming language. The analysis will include the application of descriptive and inferential statistics. Probability theory required for ML model building will be explained; including the axioms of probability, random variables, probability distributions, joint and conditional probabilities, and Bayes Theorem. Practical examples will be used to demonstrate how these techniques are applied in real-world scenarios. Modelling using Markov models will be explained.

Typical scenarios analysed by students include data provided by betting companies to build successful betting models, analysis of relationships between news and cryptocurrencies and reviews of local restaurants’ ratings. Students used the techniques shown in class to determine if there is any performance difference for football teams following a midweek Champions league or Europa League match. The assessment for this study-unit is based on making a statement and producing an analysis which supports that statement.

Study-unit Aims:

The aims of this study-unit are to:

- Gain practical experience by executing a successful, complete data science project;
- Explain how to use a range of modelling, data analytics and visualization techniques;
- Help students understand, appreciate, and apply the relevant techniques in statistics, as used in data science;
- Understand the challenges of a data science project.

Learning Outcomes:

1. Knowledge & Understanding:

By the end of the study-unit the student will be able to:

- Design practical data science projects.
- Build, compare and evaluate different computational and statistical models.
- Gather evidence and apply statistics to support a claim.
- Analyse data scientifically using significance testing.

2. Skills:

By the end of the study-unit the student will be able to:

- Produce informative visualizations, using ggplot, based on the data to explain the data or to summarise the results.
- Program in R.
- Build data rich web applications using Shiny.
- Run specific hypothesis tests on real-world data.
- Build Markov Models.

Main Text/s and any supplementary readings:

Chambers, J.M. (2008) Software for Data Analysis: Programming with R (Statistics and Computing), New York, Springer-Verlag.

James, G. (2009) An Introduction to Statistical Learning: With Applications in R. New York. Springer-Verlag.

Field, A., Miles, J. and Field, Z. (2012) Discovering Statistics Using R, London, SAGE Publications Ltd.

Hossein Pishro-Nic (2014), Introduction to Probability, Statistics and Random Processes.

STUDY-UNIT TYPE Lecture and Tutorial

Assessment Component/s Assessment Due Sept. Asst Session Weighting
Project SEM2 Yes 100%

LECTURER/S Joseph Bonello
Jean Paul Ebejer


The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the description above applies to study-units available during the academic year 2022/3. It may be subject to change in subsequent years.