Study-Unit Description

Study-Unit Description


TITLE Data Science: From Data to Knowledge

LEVEL 03 - Years 2, 3, 4 in Modular Undergraduate Course


DEPARTMENT Computer Science

DESCRIPTION This study-unit aims to introduce all the phases of a data science project - and to make the student appreciate a formal and rigorous approach to the handling of data.

The first phase is data collection, how to gather the raw material for any data project. Practical examples of incomplete and noisy data nuisances will be provided. Next, we will clean and store the data. An overview of ubiquitous data formats and different types of storage technologies (e.g. relational, graph, key-value databases) will be discussed. We will explore the different ways to visualize the collected data, and to communicate results.

The next part of the study-unit focuses on building models with the data at hand. In data modelling we will start from simple, but powerful, statistical techniques (e.g. linear regression) to more complex machine learning methods. In the last part we will discuss the challenges of big data (e.g. in a bioinformatics setting), and techniques used to mitigate the scale of data.

Throughout the unit the student will be guided using practical, real-world examples.

Study-unit Aims:

"The role of Data Scientist is the sexiest job of the 21st century." [1] The aim of this study-unit is for students to be able to dissect such claims and to determine whether they are supported by data. This unit aims to teach the student how to use data to build predictive models, to be used in different industries and scientific areas. The student will be trained to fulfil the growing need of data science and business intelligence roles in industry.

[1] Harv. Bus. Rev. 2012 Oct;90(10):70-6, 128.

Learning Outcomes:

1. Knowledge & Understanding:
By the end of the study-unit the student will be able to:

- Complete a data analysis project from start to finish (all phases: processing, storing, visualization, analysis and modelling);
- Formulate a hypothesis and prove/disprove it based on the evidence (i.e. data);
- Build statistical and machine learning models to predict outcomes using real-life and artificial datasets;
- Appreciate the interdisciplinary nature of the field; involving statistics, cognitive science, and computer science.

2. Skills:
By the end of the study-unit the student will be able to:

- Familiarize the student with Python and data science libraries (e.g. scikit-learn, matplotlib, Pandas etc.);
- Communicate results from a data science project using appropriate visualization techniques;
- Application of statistical tests to determine if datasets are significantly different from each other;
- Use of Hadoop for a Big Data project.

Main Text/s and any supplementary readings:

Main Texts:

- Doing Data Science: Straight Talk from the Frontline (2013). Cathy O'Neil, Rachel Schutt
O'Reilly's take on data science, based on a set of lectures

- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2012), Wes McKinney
Covers some of the Python libraries we will be using in this study unit

Supplementary Material:

- Naked Statistics: Stripping the Dread from the Data (2014) Charles Wheelan
Gives you a solid grasp of Statistics

- Data Scientists at Work (2014), Sebastian Gutierrez
Contains a set of interviews with luminaries in the data science field. Useful to learn which technologies are used in industry

- Statistics in Plain English (2010), Timothy C. Urdan
Excellent first textbook for people who want to gain a working knowledge in statistics

- The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2011), Trevor Hastie, Robert Tibshirani, Jerome Friedman
Very popular advanced text. Together with Tom Mitchell's "Machine Learning" considered as the bible of the field

- The Signal and the Noise: Why So Many Predictions Fail - But Some Don't (2012), Nate Silver
Great read by Nate Silver (famous for his correct US elections predicitions)

STUDY-UNIT TYPE Lecture, Independent Study, Practicum & Tutorial

Assessment Component/s Assessment Due Resit Availability Weighting
Project (including Presentation) SEM1 Yes 100%

LECTURER/S Jean Paul Ebejer

The University makes every effort to ensure that the published Courses Plans, Programmes of Study and Study-Unit information are complete and up-to-date at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the study-unit description above applies to the academic year 2018/9, if study-unit is available during this academic year, and may be subject to change in subsequent years.