Please use this identifier to cite or link to this item:
Title: Learning to harvest information for the semantic web
Authors: Ciravegna, Fabio
Chapman, Sam
Dingli, Alexiei
Wilks, Yorick
Keywords: Information retrieval -- Automation
Semantic Web
Semantic integration (Computer systems)
RDF (Document markup language)
Issue Date: 2004
Publisher: Springer Berlin Heidelberg
Citation: Ciravegna, F., Chapman, S., Dingli, A., & Wilks, Y. (2004). Learning to harvest information for the semantic web. First European Semantic Web Symposium (ESWS 2004), Heraklion. 312-326.
Abstract: In this paper we describe a methodology for harvesting in- formation from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. An- notated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.
Description: This work was carried out within the AKT project (, sponsored by the UK Engineering and Physical Sciences Research Council (grant GR/N15764/01), and the Dot.Kom project (, sponsored by the EU IST asp part of Framework V (grant IST-2001-34038).
ISSN: 03029743
Appears in Collections:Scholarly Works - FacICTAI

Files in This Item:
File Description SizeFormat 
OA Conference paper - Learning to Harvest Information for the Semantic Web.2-16.pdfLearning to harvest information for the semantic web211.71 kBAdobe PDFView/Open

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.