Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/94196| Title: | A comparison of penalized regression techniques |
| Authors: | Schembri, Lynsey (2014) |
| Keywords: | Regression analysis Statistics Mathematical statistics |
| Issue Date: | 2014 |
| Citation: | Schembri, L. (2014). A comparison of penalized regression techniques (Bachelor's dissertation). |
| Abstract: | The Ordinary Least Squares method (OLS) as defined by Carl Fredrich Gauss in the 18th century is a technique which is widely used to estimate parameter coefficients. However throughout the years as researchers were studying the stability of such technique it was noted that if the data is characterized by multicollinearity then the coefficient estimates obtained through the OLS proved to be weak. Thus, researchers after recognizing such weakness within the OLS framework embarked on a journey to develop new regression techniques which can provide stable and reliable results given that the data exhibits collinearity problems. These regression techniques include the Ridge regression, Least Absolute Shrinkage Selection Operator (Lasso) regression, Elastic net and Naive Elastic net regressions. The Ridge regression shall be introduced briefly however the main focus of this dissertation will be on the latter three techniques. The Lasso regression has the ability to minimize the effect of multicollinearity by applying shrinkage on the coefficient estimates while at the same time doing subset selection. Thus, unlike the model fitted by the Ridge regression, the resultant model of the Lasso is a parsimonious one. The Elastic Net regression (EN) and Naive Elastic Net (NEN) regression can be considered as hybrid models of the Ridge regression and Lasso regression whereby through their penalty function they also minimize the effect of shrinkage and also apply subset selection. The novelty of the EN and NEN regression lies with their ability to tackle the problem that arises in the Lasso regression whereby if a group of highly correlated variables exist in the dataset, the Lasso tends to choose a variable randomly from within this group. This is generally known as the "grouping effect" in literature. These techniques shall be implemented into two types of studies, a simulation study and a real life dataset study in order to analyse and compare their performance under different scenarios. In the simulation study, three datasets with different levels of multicollinearity and dimensions shall be analyzed while for the real life dataset, the high dimensional case ( n < p) shall be studied. |
| Description: | B.SC.(HONS)STATS.&OP.RESEARCH |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/94196 |
| Appears in Collections: | Dissertations - FacSci - 1965-2014 Dissertations - FacSciSOR - 2000-2014 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| BSC(HONS)STATISTICS_Schembri_Lynsey_2014.pdf Restricted Access | 8.28 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
