Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/92156| Title: | Towards extracting, analysing and verifying statistical claims |
| Authors: | Galea, Matthew (2021) |
| Keywords: | Data sets -- Malta Statistical Office of the European Communities Time-series analysis Databases -- Evaluation |
| Issue Date: | 2021 |
| Citation: | Galea, M. (2021). Towards extracting, analysing and verifying statistical claims (Bachelor’s dissertation). |
| Abstract: | Claim verification is the task of determining the veracity of a claim. Typical automatic approaches compare claims to textual sources, such as databases of previously fact checked claims or peer-reviewed publications, which present a vast array of semantic challenges. A further complication is that the truthfulness of a claim is usually given on a five- or six-point scale, rather than just True or False. Moreover, human fact-checkers themselves have been shown to not agree on the same statements. This work proposes a novel approach which tackles such issues, with the caveat that only a subset of claims can be analysed. These will be referred to as statistically verifiable claims, since they can be verified using statistical analysis rather than compared to other textual sources. A statistically verifiable claim is made up of variables and events. Variables are quantitative entities with change over time (e.g. population). Events are either changes in variables (e.g. increase or decrease), or relationships between variables (e.g. causal event). For example, the claim by The Guardian: “However, a rise in temperature also appeared to lead to an increase in the amount of carbon dioxide in the atmosphere”black is a statistically verifiable claim, containing two change events: rise in temperature, and increase in the amount of carbon dioxide in the atmosphere. Moreover, there is a causal event relationship between the two variables, where variable temperature is the cause and the amount of carbon dioxide in the atmosphere is the effect. A solution was developed which extracts and analyses the statistically verifiable claims. Variables and their polarity (increase or decrease) are extracted using chunking techniques. Then, causal relationships are extracted using a Self-Attentive BiLSTM-CRF with Transferred Embeddings (SCITE). The causal entities are mapped to the variables using an ensemble system. A further dataset retrieval system is used to obtain time series data given the variables identified in the claim and the Eurostat dataset. These analysis allow for a future extension to the system or a human expert to label claims as True or False through variable time series analysis. An evaluation of the system implemented revealed that the identification of statistically verifiable claims and the extraction of change events achieved F1 scores of 97.1% and 85.5% respectively. The causal event extraction obtained an F1 Score of 52.9%, a 22.7% improvement over the baseline model. In general, sentences input to the system were correctly analysed 79.3% of the time. |
| Description: | B.Sc. IT (Hons)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/92156 |
| Appears in Collections: | Dissertations - FacICT - 2021 Dissertations - FacICTAI - 2021 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 21BITAI023.pdf Restricted Access | 1.55 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
