Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/107790| Title: | Find&define : automatic extraction of definitions from text |
| Authors: | Bezzina, Paolo (2022) |
| Keywords: | Definition (Philosophy) Neural networks (Computer science) Machine learning |
| Issue Date: | 2022 |
| Citation: | Bezzina, P. (2022). Find&define: automatic extraction of definitions from text (Bachelor's dissertation). |
| Abstract: | Definitions are the main means through which humans communicate and learn the meanings of concepts. Generally, dictionaries are the best source for definitions of words. However, keeping a dictionary up-to-date is tedious work, with several new definitions being introduced as language evolves and new words entering the language or existing words shifting in meaning. Thus, automated extraction of textual definitions is beneficial. It can assist in the curation of dictionaries and can also be used in a language learning environment whereby learners have access to definitions of new terms they encounter. The automatic extraction of definitions from text using machine learning techniques is a growing research field. Typically, this was always done in a limited, structured, well-defined way using pattern matching. Recently, more advanced techniques, namely neural networks, are being used to solve this problem. These also allow the ability to recognise definitions without specific defining phrases, such as ‘is’, ‘means’, ‘is defined’. In this project, there are three main components which make up the Definition Extraction pipeline. The first component classifies whether a given sentence is a defining sentence or not. The second component classifies each token in the sentence with a label that indicates its type, including ‘term’, ‘definition’, ‘alias’, or ‘referential term’ of a sentence. This label feeds into the following component. The third and final component labels the relation between each tag in a sentence, which can be, ‘direct-definition’, ‘AKA’, ‘supplements’, amongst others, finally forming a definition. To carry out these tasks, a pre-trained neural network is fine-tuned for this specific purpose of definition extraction. For this project, the DEFT Corpus was used, which is a dataset specifically aimed at definition extraction. The project compares the performance of various settings and experiments including BERT, RoBERTa, DistilBERT, and AlBERT, to solve each of the three components. From our experiments, we find that BERT performs best for sentence and relation classification , whilst AlBERT had the best performance on token classification. Definition extraction can be beneficial in a variety of situations, such as building dictionaries or knowledge graphs, which benefit from increased connectivity and relevance, especially for question-answering machines. |
| Description: | B.Sc. IT (Hons)(Melit.) |
| URI: | https://www.um.edu.mt/library/oar/handle/123456789/107790 |
| Appears in Collections: | Dissertations - FacICT - 2022 Dissertations - FacICTAI - 2022 |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2208ICTICT390900013592_1.PDF Restricted Access | 1.41 MB | Adobe PDF | View/Open Request a copy |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.
