Extended no-K-Means for search results clustering

Azzopardi, Joel; Staff, Chris; Layfield, Colin

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/103343

Title:	Extended no-K-Means for search results clustering
Authors:	Azzopardi, Joel Staff, Chris Layfield, Colin
Keywords:	Algorithms Search engines Data mining Cluster analysis Information retrieval
Issue Date:	2016
Publisher:	iSWAG Symposium
Citation:	Azzopardi, J., Staff, C., & Layfield, C. (2016). Extended No-K-Means for search results clustering. 2nd International Symposium on Web Algorithms (iSWAG), France.
Abstract:	The No-K-Means clustering algorithm is used for Search Results Clustering. It clusters using a similarity threshold and a Cluster Validity Index to determine cluster membership rather than using prior knowledge of the target number of clusters to create. In this paper, we present an improvement to the algorithm and several new results. We justify the selection of Generalized Dunn’s Index as the Cluster Validity Index. We compare results obtained by No-K-Means, Bisecting K-Means, Suffix Tree Clustering, and Lingo on the same Gold Standard collection. No-K-Means achieves even higher accuracy than previously reported when any Wikipedia snippets appearing in the list of results are used to ‘seed’ clusters. To show that No-K-Means is not dependent Wikipedia results snippets, we remove them from the test and Gold Standard collection and compare No-K-Means and the other clustering algorithms’ accuracy. No-K-Means consistently produces better clusters. Finally, we show that No-K-Means’s time complexity is favourable compared to other clustering algorithms.
URI:	https://www.um.edu.mt/library/oar/handle/123456789/103343
Appears in Collections:	Scholarly Works - FacICTAI

Files in This Item:

File	Description	Size	Format
Extended_no-K-Means_for_search_results_clustering(2016).pdf		205.95 kB	Adobe PDF	View/Open

Show full item record Statistics