Please use this identifier to cite or link to this item:
https://www.um.edu.mt/library/oar/handle/123456789/103343
Title: | Extended no-K-Means for search results clustering |
Authors: | Azzopardi, Joel Staff, Chris Layfield, Colin |
Keywords: | Algorithms Search engines Data mining Cluster analysis Information retrieval |
Issue Date: | 2016 |
Publisher: | iSWAG Symposium |
Citation: | Azzopardi, J., Staff, C., & Layfield, C. (2016). Extended No-K-Means for search results clustering. 2nd International Symposium on Web Algorithms (iSWAG), France. |
Abstract: | The No-K-Means clustering algorithm is used for Search Results Clustering. It clusters using a similarity threshold and a Cluster Validity Index to determine cluster membership rather than using prior knowledge of the target number of clusters to create. In this paper, we present an improvement to the algorithm and several new results. We justify the selection of Generalized Dunn’s Index as the Cluster Validity Index. We compare results obtained by No-K-Means, Bisecting K-Means, Suffix Tree Clustering, and Lingo on the same Gold Standard collection. No-K-Means achieves even higher accuracy than previously reported when any Wikipedia snippets appearing in the list of results are used to ‘seed’ clusters. To show that No-K-Means is not dependent Wikipedia results snippets, we remove them from the test and Gold Standard collection and compare No-K-Means and the other clustering algorithms’ accuracy. No-K-Means consistently produces better clusters. Finally, we show that No-K-Means’s time complexity is favourable compared to other clustering algorithms. |
URI: | https://www.um.edu.mt/library/oar/handle/123456789/103343 |
Appears in Collections: | Scholarly Works - FacICTAI |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Extended_no-K-Means_for_search_results_clustering(2016).pdf | 205.95 kB | Adobe PDF | View/Open |
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.