Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorCiravegna, Fabio
dc.contributor.authorDingli, Alexiei
dc.contributor.authorGuthrie, David
dc.contributor.authorWilks, Yorick
dc.identifier.citationCiravegna, F., Dingli, A., Guthrie, D., & Wilks, Y. (2003). Integrating information to bootstrap information extraction from web sites. IJCAI-03 Workshop on Information Integration on the Web, 2003. 1-6.en_GB
dc.description.abstractIn this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention. Learning is seeded by integrating information from structured sources (e.g. databases and digital libraries). Retrieved information is then used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to train more complex IE engines. All the corpora for training the IE en- gines are produced automatically by integrating in- formation from different sources such as available corpora and services (e.g. databases or digital libraries, etc.). User intervention is limited to providing an initial URL and adding information missed by the different modules when the computation has finished. The information added or delete by the user can then be reused providing further training and therefore getting more information (recall) and/or more precision. We are currently applying this methodology to mining web sites of Computer Science departments.en_GB
dc.publisherInternational Joint Conferences on Artificial Intelligence Organizationen_GB
dc.subjectInformation organizationen_GB
dc.subjectInformation retrieval -- Automationen_GB
dc.subjectData miningen_GB
dc.subjectDigital librariesen_GB
dc.titleIntegrating information to bootstrap information extraction from web sitesen_GB
dc.rights.holderThe copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder.en_GB
dc.bibliographicCitation.conferencenameIJCAI-03 Workshop on Information Integration on the Weben_GB
dc.bibliographicCitation.conferenceplaceAcapulco, Mexico, 9-10/08/2003en_GB
Appears in Collections:Scholarly Works - FacICTAI

Files in This Item:
File Description SizeFormat 
OA - Integrating Information to Bootstrap Information Extraction from Web Sites.2-7.pdfIntegrating information to bootstrap information extraction from web sites109.78 kBAdobe PDFView/Open

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.