Please use this identifier to cite or link to this item:
Title: Analysis of police violence records through text mining techniques
Authors: Barbara, Christina (2021)
Keywords: Data mining
Data sets
Cluster analysis -- Computer programs
Police brutality
Victims of violent crimes
Issue Date: 2021
Citation: Barbara, C. (2021). Analysis of police violence records through text mining techniques (Bachelor’s dissertation).
Abstract: In this research, we apply data mining techniques on the Mapping Police Violence dataset, which provides information on every individual killed by police in the USA since 2013. Specifically, we focus on the killings which took place from 2013 to 2019. Motivated by the availability of such data, we discover knowledge related to police violence, by profiling typical violence victims, analysing violence across different states, and predicting the trend such incidents follow. Our first objective involves profiling the victims, tackled by clustering the data and extracting the typical victim belonging to each cluster set. This is done using different clustering algorithms (namely, K-Means, K-Medoids and Self-Organising Maps). We validate the generated profiles by observing how many killings in the dataset are accurately described by the different profiles. Our second objective gathers the data belonging to the states having the most killings respective to their population. By clustering this data, we find the typical victim profiles for those locations. Our third objective involves the utilisation of decision tree and random forest regressors, and linear regression techniques, to predict the number of future police killings based on information related to past incidents. Here, we consider each state’s population and unemployment rate to find whether including such external information is helpful in predicting the number of killings accurately. The results produced are evaluated by comparing the predicted number to the actual number of killings which took place. The clustering and regression techniques found to be the most suitable for our work are K-Means clustering and random forest regression, both producing better results than the other techniques considered. We find that by including population data during the crime prediction process, the accuracy of our results improved, as the smallest mean absolute error produced indicates that results vary by only 3 killings. Despite the challenges of victim profiling, we have managed to produce profiles which overall, cover between 25% and 70% of the designated test set. Thus, we believe that we have succeeded in fulfilling our objectives of victim profiling and crime prediction.
Description: B.Sc. IT (Hons)(Melit.)
Appears in Collections:Dissertations - FacICT - 2021
Dissertations - FacICTAI - 2021

Files in This Item:
File Description SizeFormat 
  Restricted Access
2 MBAdobe PDFView/Open Request a copy

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.