Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/92094
Title: | Change detection in semi-structured documents |
Authors: | Bonnici, Wayne (2021) |
Keywords: | Digital forensic science Microsoft Word Word processing XML (Document markup language) Algorithms |
Issue Date: | 2021 |
Citation: | Bonnici, W. (2021). Change detection in semi-structured documents (Bachelor's dissertation). |
Abstract: | In digital forensics investigations it is sometimes required to sift and analyse a large number of documents handed in as evidence. A computer system can aid an investigator to establish the exact derivation of changes in a pair of documents by summarising changes into separate categories, such as insertions and deletions. Traditionally, this had to be done with diff utilities that cannot understand the hierarchical structure of an XML document, and can often detect the wrong changes. This approach is cumbersome, and the investigator must perform several pre- and postprocessing steps to ensure accurate derivation of changes. This project targets documents encoded in the Office Open XML .docx format, the default file format of Microsoft Word, and assumes that the documents contain text and track change annotations. A tool is developed, called DocxDiff, that is easy to use and requires minimal user interaction. It uses a change detection algorithm that is specifically designed to work with XML documents and provides accurate delta. DocxDiff ignores irrelevant deltas, such as nodes that are represented differently, but semantically equivalent. With the help of XPaths, it only inspects nodes that have changed without the need to wholly traverse the document tree. DocxDiff encapsulates the delta and presents any altered track change state to the investigator in an easy to interpret summary of changes. The investigator is expected to use this summary to determine whether to shortlist the documents for further analysis. DocxDiff achieved a very good track change detection accuracy and changes are always classified in the right category. |
Description: | B.Sc. IT (Hons)(Melit.) |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/92094 |
Appears in Collections: | Dissertations - FacICT - 2021 Dissertations - FacICTCIS - 2021 |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
21BITSD007.pdf Restricted Access | 5.55 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.