Please use this identifier to cite or link to this item:
Title: Extended no-K-Means for search results clustering
Authors: Azzopardi, Joel
Staff, Chris
Layfield, Colin
Keywords: Algorithms
Search engines
Data mining
Cluster analysis
Information retrieval
Issue Date: 2016
Publisher: iSWAG Symposium
Citation: Azzopardi, J., Staff, C., & Layfield, C. (2016). Extended No-K-Means for search results clustering. 2nd International Symposium on Web Algorithms (iSWAG), France.
Abstract: The No-K-Means clustering algorithm is used for Search Results Clustering. It clusters using a similarity threshold and a Cluster Validity Index to determine cluster membership rather than using prior knowledge of the target number of clusters to create. In this paper, we present an improvement to the algorithm and several new results. We justify the selection of Generalized Dunn’s Index as the Cluster Validity Index. We compare results obtained by No-K-Means, Bisecting K-Means, Suffix Tree Clustering, and Lingo on the same Gold Standard collection. No-K-Means achieves even higher accuracy than previously reported when any Wikipedia snippets appearing in the list of results are used to ‘seed’ clusters. To show that No-K-Means is not dependent Wikipedia results snippets, we remove them from the test and Gold Standard collection and compare No-K-Means and the other clustering algorithms’ accuracy. No-K-Means consistently produces better clusters. Finally, we show that No-K-Means’s time complexity is favourable compared to other clustering algorithms.
Appears in Collections:Scholarly Works - FacICTAI

Files in This Item:
File Description SizeFormat 
Extended_no-K-Means_for_search_results_clustering(2016).pdf205.95 kBAdobe PDFView/Open

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.