Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/92094
Title: Change detection in semi-structured documents
Authors: Bonnici, Wayne (2021)
Keywords: Digital forensic science
Microsoft Word
Word processing
XML (Document markup language)
Algorithms
Issue Date: 2021
Citation: Bonnici, W. (2021). Change detection in semi-structured documents (Bachelor's dissertation).
Abstract: In digital forensics investigations it is sometimes required to sift and analyse a large number of documents handed in as evidence. A computer system can aid an investigator to establish the exact derivation of changes in a pair of documents by summarising changes into separate categories, such as insertions and deletions. Traditionally, this had to be done with diff utilities that cannot understand the hierarchical structure of an XML document, and can often detect the wrong changes. This approach is cumbersome, and the investigator must perform several pre- and postprocessing steps to ensure accurate derivation of changes. This project targets documents encoded in the Office Open XML .docx format, the default file format of Microsoft Word, and assumes that the documents contain text and track change annotations. A tool is developed, called DocxDiff, that is easy to use and requires minimal user interaction. It uses a change detection algorithm that is specifically designed to work with XML documents and provides accurate delta. DocxDiff ignores irrelevant deltas, such as nodes that are represented differently, but semantically equivalent. With the help of XPaths, it only inspects nodes that have changed without the need to wholly traverse the document tree. DocxDiff encapsulates the delta and presents any altered track change state to the investigator in an easy to interpret summary of changes. The investigator is expected to use this summary to determine whether to shortlist the documents for further analysis. DocxDiff achieved a very good track change detection accuracy and changes are always classified in the right category.
Description: B.Sc. IT (Hons)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/92094
Appears in Collections:Dissertations - FacICT - 2021
Dissertations - FacICTCIS - 2021

Files in This Item:
File Description SizeFormat 
21BITSD007.pdf
  Restricted Access
5.55 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.