CODE 
CPS3235 

TITLE 
Data Science: From Data to Knowledge 

LEVEL 
03  Years 2, 3, 4 in Modular Undergraduate Course 

ECTS CREDITS 
5 

DEPARTMENT 
Computer Science 

DESCRIPTION 
This studyunit aims to introduce all the phases of a data science project  and to make the student appreciate a formal and rigorous approach to the handling of data.
The first phase is data collection, how to gather the raw material for any data project. Practical examples of incomplete and noisy data nuisances will be provided. Next, we will clean and store the data. An overview of ubiquitous data formats and different types of storage technologies (e.g. relational, graph, keyvalue databases) will be discussed. We will explore the different ways to visualize the collected data, and to communicate results.
The next part of the studyunit focuses on building models with the data at hand. In data modelling we will start from simple, but powerful, statistical techniques (e.g. linear regression) to more complex machine learning methods. In the last part we will discuss the challenges of big data (e.g. in a bioinformatics setting), and techniques used to mitigate the scale of data.
Throughout the unit the student will be guided using practical, realworld examples.
Studyunit Aims:
"The role of Data Scientist is the sexiest job of the 21st century." [1] The aim of this studyunit is for students to be able to dissect such claims and to determine whether they are supported by data. This unit aims to teach the student how to use data to build predictive models, to be used in different industries and scientific areas. The student will be trained to fulfil the growing need of data science and business intelligence roles in industry.
[1] Harv. Bus. Rev. 2012 Oct;90(10):706, 128.
Learning Outcomes:
1. Knowledge & Understanding: By the end of the studyunit the student will be able to:
 Complete a data analysis project from start to finish (all phases: processing, storing, visualization, analysis and modelling);  Formulate a hypothesis and prove/disprove it based on the evidence (i.e. data);  Build statistical and machine learning models to predict outcomes using reallife and artificial datasets;  Appreciate the interdisciplinary nature of the field; involving statistics, cognitive science, and computer science.
2. Skills: By the end of the studyunit the student will be able to:
 Familiarize the student with Python and data science libraries (e.g. scikitlearn, matplotlib, Pandas etc.);  Communicate results from a data science project using appropriate visualization techniques;  Application of statistical tests to determine if datasets are significantly different from each other;  Use of Hadoop for a Big Data project.
Main Text/s and any supplementary readings:
Main Texts:
 Doing Data Science: Straight Talk from the Frontline (2013). Cathy O'Neil, Rachel Schutt O'Reilly's take on data science, based on a set of lectures
 Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2012), Wes McKinney Covers some of the Python libraries we will be using in this study unit
Supplementary Material:
 Naked Statistics: Stripping the Dread from the Data (2014) Charles Wheelan Gives you a solid grasp of Statistics
 Data Scientists at Work (2014), Sebastian Gutierrez Contains a set of interviews with luminaries in the data science field. Useful to learn which technologies are used in industry
 Statistics in Plain English (2010), Timothy C. Urdan Excellent first textbook for people who want to gain a working knowledge in statistics
 The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2011), Trevor Hastie, Robert Tibshirani, Jerome Friedman Very popular advanced text. Together with Tom Mitchell's "Machine Learning" considered as the bible of the field
 The Signal and the Noise: Why So Many Predictions Fail  But Some Don't (2012), Nate Silver Great read by Nate Silver (famous for his correct US elections predicitions)


STUDYUNIT TYPE 
Lecture, Independent Study, Practicum & Tutorial 

METHOD OF ASSESSMENT 
Assessment Component/s 
Resit Availability 
Weighting 
Project (including Presentation) 
Yes 
100% 


LECTURER/S 
Jean Paul Ebejer


The University makes every effort to ensure that the published Courses Plans, Programmes of Study and StudyUnit information are complete and uptodate at the time of publication. The University reserves the right to make changes in case errors are detected after publication.
The availability of optional units may be subject to timetabling constraints.
Units not attracting a sufficient number of registrations may be withdrawn without notice.
It should be noted that all the information in the studyunit description above applies to the academic year 2017/8, if studyunit is available during this academic year, and may be subject to change in subsequent years.

26 September 2017
http://www.um.edu.mt/ict/studyunit