Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/22577
Title: Automatic clustering of news reports
Authors: Azzopardi, Joel
Keywords: Document clustering
Cluster analysis -- Data processing
Cluster analysis -- Computer programs
News Web sites
Issue Date: 2007
Publisher: University of Malta. Faculty of ICT
Citation: Azzopardi, J. (2007). Automatic clustering of news reports. 5th Computer Science Annual Workshop (CSAW’07), Msida. 11-23.
Abstract: The automatic clustering of news reports from various web-based news sites into clusters according to the event they cover serves not only to facilitate browsing of news reports by a users but may also serve as an initial stage in other complex systems such as Multi-Document Summarization systems or Document Fusion systems. In contrast to the usual scenarios of document clustering whereby the document collections are static or quasi-static, news sites are continuously updated with re- ports concerning new events. Here, we present a News Report Clustering system which is able to receive a stream of news reports which it clusters on the fly according to the event they cover. New clusters are automat- ically created as necessary for news reports which are covering ‘new’, previously unreported events. We compare the results of our system to the results produced by a standard K-Means clustering system, and we show that our system performs significantly better than the standard K- Means system even though the K-Means system was supplied with the correct number of clusters that should be produced. In fact, our clustering system obtained an average of 11.95% better recall, 28.68% better precision and 0.89% less fallout than the standard K-Means clustering system.
URI: https://www.um.edu.mt/library/oar//handle/123456789/22577
Appears in Collections:Scholarly Works - FacICTAI
Scholarly Works - FacICTCS

Files in This Item:
File Description SizeFormat 
Proceedings of CSAW’07 - A2.pdf296.91 kBAdobe PDFView/Open


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.