Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/135517
Title: Leveraging large language models for security log analysis
Authors: Amankwa, Davis Poku (2024)
Keywords: Computer security
Natural language processing (Computer science)
Machine learning
Neural networks (Computer science)
Artificial intelligence
Issue Date: 2024
Citation: Amankwa, D. P. (2024). Leveraging large language models for security log analysis (Master’s dissertation).
Abstract: In the field of cybersecurity, the demand for skilled analysts is rapidly increasing, as underscored by a projected 32% growth in the United States from 2018 to 2028. Amidst this growing demand, analysts confront the challenge of fatigue, spurred by incessant alerts and complex threat landscapes, which can undermine their efficiency and promptness. In this study, we explore the concept of leveraging artificial intelligence, specifically through a chatbot assistant, to enhance cybersecurity investigation workflow. Thus, by facilitating interactions with security logs via natural language prompts, the solution aims to mitigate cognitive load and streamline investigations. The objectives of the study involve the creation of a dataset derived from the MITRE ATT&CK CSV Enterprise Attack Detection Techniques to train Large Language Models for generating security analysis queries from natural English language, and also the evaluation and enhancement of the state-of-the-art Large Language Models (LLMs) used in this study. The proposed solution encompasses fine-tuning of the GPT-3 and GPT-3.5 models with the dataset of security incidence questions in English and their corresponding Microsoft Sentinel Security Information and Event Management (SIEM) queries created during the study. The evaluation phase focuses on the models’ proficiency in translating natural language questions into accurate security analysis queries, with a focus on practical applicability and accuracy. The best performing instance of the trained models obtained a 39.5% exact match rate and notable enhancements in token-level accuracy and BLEU scores through hyperparameter optimizations during fine-tuning. However, the study encounters limitations, including reliance on synthetic data and a scarcity of real-world testing across diverse SIEMs. Future work suggests creating real-world datasets encompassing diverse attack techniques and exploring different Large Language Model (LLM) architectures. The development of open-source SIEM datasets is encouraged to broaden accessibility and empower both experienced and novice cybersecurity analysts
Description: M.Sc.(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/135517
Appears in Collections:Dissertations - FacICT - 2024
Dissertations - FacICTAI - 2024

Files in This Item:
File Description SizeFormat 
2419ICTICS520005058967_1.PDF
  Restricted Access
3.02 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.