Navigating through a clustered search-space

Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/35896

Title:	Navigating through a clustered search-space
Authors:	Pace, Brian
Keywords:	Algorithms Search engines
Issue Date:	2018
Citation:	Pace, B. (2018). Navigating through a clustered search-space (Bachelor's dissertation).
Abstract:	Results presented by a search engine to the user usually consist of a collection of relevant web pages and documents. These usually contain small snippets of each resource that the user searched for. Search-Engine Optimization or accumulated user history rank the results into a meaningful order. This ensures that on top of the list are what the search engine considers to be the most relevant results, for quick retrieval. Yet, these may fail to provide what the user requires. This may be due to inaccurate query keywords or ambiguous keywords. An example of this problem is homographs. The user will have to go through countless results with different topics to find a suitable one. Besides this, the user will have to deal with vast numbers of pages of results. This may cause the required web resource to become concealed. This may throw the user o trying to invest their time into searching further. Search result clustering can reduce this problem. This project aims at implementing a simple, easy-to-use navigation system for clustered search results based on No-K-Means, a search results clustering system. When the user submits a query, the query is submitted to Bing and the results are clustered using No-K-Means. The user can view the clusters by an automatically generated label or by the document title of a representative result in the cluster. A user can `drill-down' into a cluster, in which case an expanded query is automatically generated and submitted to Bing. The new results are clustered and presented to the user. Our approach is evaluated using a web-based application consisting of three web pages in total. The user can sign up or log in, enter the initial query and navigate through the visualization of the result clusters. The user will press a submit button when the ideal results are found. This will trigger a call for a large number of results using the initial query. A percentage of how many ideal results are present in the large set of results is calculated. Afterwards, the user is redirected to a statistics page. This page contains various useful user statistics. Examples are the number of clicks, duration, and number of drill-downs. The user has the option to start another session with updated user history. An SQL relational database stores the results, user history and user details for efficient storage and retrieval. The evaluation will also include an online questionnaire. This is used to gather extra feedback on the structure and effectiveness of this method of searching. The chosen design structure for the visualization is in the form of a hierarchical tree structure as it is one of the most efficient ways to visualize a hierarchy of nodes. Drilling down and initiating rollbacks are very intuitive on this form. It comes naturally to a user to click on leaf nodes to expand the tree and on non-leaf nodes to hide any child nodes. The evaluation was carried out with 40 users in order to analyze the performance of the visualization with regards to user interaction and compare the efficiency of the clustering system with an unclustered list of results using Subtopic Reach Time. The inclusion of user history was also tested, observing the improvement between unranked and ranked clusters while the users were going through the visualization. A feedback form was filled by every user to see whether the provided design was intuitive and user-friendly, and guided the user to their goal with ease without having to manually edit the query. The results show that the users preferred the proposed system over ranked lists of results and the user history improved the overall experience for the users, with improvements being listed as future work. The statistics also show that there are fewer overall results being examined in this visualization and the users are shown more relevant results that would be hidden in a ranked list of unclassified results. An additional advantage is that more relevant results are obtained by automatically expanding the query to include discriminatory terms on the basis of the user `drilling down' through clusters, without the user having to manually modify the query.
Description:	B.SC.ICT(HONS)ARTIFICIAL INTELLIGENCE
URI:	https://www.um.edu.mt/library/oar//handle/123456789/35896
Appears in Collections:	Dissertations - FacICT - 2018 Dissertations - FacICTAI - 2018

Files in This Item:

File	Description	Size	Format
18BSCIT011.pdf Restricted Access		1.27 MB	Adobe PDF	View/Open Request a copy

Show full item record Statistics