Adaptive information extraction for document annotation in amilcare

Ciravegna, Fabio; Dingli, Alexiei; Wilks, Yorick; Petrelli, Daniela

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/16947

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ciravegna, Fabio	-
dc.contributor.author	Dingli, Alexiei	-
dc.contributor.author	Wilks, Yorick	-
dc.contributor.author	Petrelli, Daniela	-
dc.date.accessioned	2017-03-04T17:34:24Z	-
dc.date.available	2017-03-04T17:34:24Z	-
dc.date.issued	2002	-
dc.identifier.citation	Ciravegna, F., Dingli, A., Wilks, Y., & Petrelli, D. (2002). Adaptive information extraction for document annotation in amilcare. 25th ACM/SIGIR International Conference on Research and Development in Information Retrieval, Tampere. 451.	en_GB
dc.identifier.issn	01635840	-
dc.identifier.uri	https://www.um.edu.mt/library/oar//handle/123456789/16947	-
dc.description.abstract	Amilcare is a tool for Adaptive Information Extraction (IE) designed for supporting active annotation of documents for the Semantic Web (SW). It can be used either for unsupervised document annotation or as a support for human annotation. Amilcare is portable to new applications/domains without any knowledge of IE, as it just requires users to annotate a small training corpus with the information to be extracted. It is based on (LP)2, a supervised learning strategy for IE able to cope with different texts types, from newspaper-like texts, to rigidly formatted Web pages and even a mixture of them[1][5].Adaptation starts with the definition of a tag set for annotation, possibly organized as an ontology. Then users have to manually annotate a small training corpus. Amilcare provides a default mouse-based interface called Melita, where annotations are inserted by first selecting a tag from the ontology and then identifying the text area to annotate with the mouse. Differently from similar annotation tools [4, 5], Melita actively supports training corpus annotation. While users annotate texts, Amilcare runs in the background learning how to reproduce the inserted annotation. Induced rules are silently applied to new texts and their results are compared with the user annotation. When its rules reach a (user-defined) level of accuracy, Melita presents new texts with a preliminary annotation derived by the rule application. In this case users have just to correct mistakes and add missing annotations. User corrections are inputted back to the learner for retraining. This technique focuses the slow and expensive user activity on uncovered cases, avoiding requiring annotating cases where a satisfying effectiveness is already reached. Moreover validating extracted information is a much simpler task than tagging bare texts (and also less error prone), speeding up the process considerably. At the end of the corpus annotation process, the system is trained and the application can be delivered. MnM [6] and Ontomat annotizer [7] are two annotation tools adopting Amilcare's learner.In this demo we simulate the annotation of a small corpus and we show how and when Amilcare is able to support users in the annotation process, focusing on the way the user can control the tool's proactivity and intrusivity. We will also quantify such support with data derived from a number of experiments on corpora. We will focus on training corpus size and correctness of suggestions when the corpus is increased.	en_GB
dc.language.iso	en	en_GB
dc.publisher	The ACM Digital Library	en_GB
dc.rights	info:eu-repo/semantics/restrictedAccess	en_GB
dc.subject	Natural language processing (Computer science)	en_GB
dc.subject	Semantic Web	en_GB
dc.subject	Self-adaptive software	en_GB
dc.subject	Knowledge management	en_GB
dc.subject	Corpora (Linguistics)	en_GB
dc.title	Adaptive information extraction for document annotation in amilcare	en_GB
dc.type	conferenceObject	en_GB
dc.rights.holder	The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.	en_GB
dc.bibliographicCitation.conferencename	25th ACM/SIGIR International Conference on Research and Development in Information Retrieval	en_GB
dc.bibliographicCitation.conferenceplace	Tampere, Finland, 11-15/08/2002	en_GB
dc.description.reviewed	peer-reviewed	en_GB
Appears in Collections:	Scholarly Works - FacICTAI

Files in This Item:

File	Description	Size	Format
Conference paper - Adaptive information extraction for document annotation in Amilcare.pdf Restricted Access	Adaptive information extraction for document annotation in amilcare	131.37 kB	Adobe PDF	View/Open Request a copy

Show simple item record Statistics