Please use this identifier to cite or link to this item:
Title: Predictive analysis of football matches using in-play data
Authors: Zammit, Matthew Joseph
Keywords: F.A. Premier League
Soccer -- England
Soccer -- Italy
Machine learning
Soccer -- Betting -- Malta
Issue Date: 2018
Citation: Zammit, M.J. (2018). Predictive analysis of football matches using in-play data (Master's dissertation).
Abstract: Sports betting has emerged as a booming industry driven by the popularity of betting on different scenarios within sporting events. Football is one of the most popular sports that is followed by millions of fans around the world. Its dynamic nature, low-scoring matches and other complex variables that could influence the outcome of a game make it hard to predict the outcome of a match. In recent years, more in-game and detailed statistics have been collected and analysed by professionals of the game. The aim of this study is to investigate the application of machine learning techniques for predicting the fulltime result (Home Win/Draw/Away Win) of football matches at the half-time interval by the use of in-play data. We collect and analyse a rich data set of temporal data from seven seasons of five major European leagues between 2009 and 2016. We focus our research on the application of random forest as the main machine learning technique for this problem. We build a genetic algorithm to perform feature selection and hyper-parameter tuning to investigate if the initial results could be further improved. Finally, we contextualise the data set with pre-match data and analyse how this changes the results and the predictors selected. We find that after feature selection and model tuning, the random forest has a mean accuracy 45.0% (±1.6) on unseen data across the different leagues. With the addition of pre-match data the mean accuracy increased to 46.0% (±2.1), but the results for each league remained similar. We evaluate different models on an unseen data set from the year 2016/17. The tuned random forest using both pre-match and in-game data achieves a mean accuracy of 44.8% across the leagues. The highest accuracy was that of 50.0% on the test sample of the English Premier League. The lowest was that of 40.0% on the French and Spanish leagues. We also converted the random forest classification to a probabilistic prediction based on the output of the underlying decision trees. We compare these probabilities to implied odds from a betting exchange (Betfair) on small sample of matches from the unseen data of the English and Italian leagues. We used the Brier Score function to calculate the accuracy of the predictions. Results show that the accuracy is similar for the English Premier League and Italian Serie A for both the Random Forest and Betfair. This comparable performance may indicate that the Machine Learning predictions are similar to those of the betting exchange markets.
Appears in Collections:Dissertations - FacICT - 2018
Dissertations - FacICTAI - 2018

Files in This Item:
File Description SizeFormat 
  Restricted Access
3.98 MBAdobe PDFView/Open Request a copy

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.