Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/92245
Title: Sunago : user directed browsing using subtopic detection and collaborative filtering
Authors: Muscat, Karl (2007)
Keywords: Browsers (Computer programs)
Web search engines
Internet searching
Issue Date: 2007
Citation: Muscat, K. (2007). Sunago : user directed browsing using subtopic detection and collaborative filtering (Bachelor's dissertation).
Abstract: The large amount of information available online makes it cognitively impossible for a human being to handle. Most of the information present on the Web is accessible only by relying on a web search engine. Search engines are good at returning long lists of results based on a given query, and their indexing and ranking methods have improved drastically. People often provide very little information to these services and expect high quality results. The average query string size is estimated to be 2.35 words and 58% of users never view more than the first page of results on their first search. This can only mean that either the users are satisfied with the first few results returned or that the searching technologies available are limiting the users' experience. In the first case, a perfect query must be submitted, which, as Jansen et al. argue, is not normal in regular IR systems, where query modification "is very much the way of doing things". Sunago will attempt to improve the searching experience by providing a document retrieval system that delivers personalised content to the user with minimal user intervention. The returned web-content will have been recommended by other users with similar interests as the active user. By rating web-content, the users will be enhancing their interests and automatically form a network of like-minded users that exchange similar and relevant content. This is achievable by constructing models of webpages that the users sees, together with models that reflect the current and long-term interests of users. Clustering techniques are then used to group together users with similar tastes and pages with similar contents. After an evaluation carried out by real user on Sunago, we found out that the majority (72.72%) of the users rated the Google results higher than the recommendations (51.51%) or equal to the recommendations (33.33%). We noted that 63.64% of rates show that Google results were either perfect or good and 30.3% of rates show that the recommendations were good or perfect. We also noted that none of the volunteers think that either the results from Google or the recommendations are useless. In fact, the lowest rating given by the users was 2 - 'Not what I was looking for.': 6.06% to Google results and 21.21% to the recommendations. Recommendations given by our system had a slightly high incidence of low ratings. This is due to the limited amount of data that we had. Although we tried to collect data from specific sources so that we reduce data sparsity and, although the number of indexed pages and ratings was not bad, the resulting clusters still had a prevalence of one-element clusters, especially in webpage clustering. From the results and data collected, we concluded that the we did not have enough overlap between users, ratings and pages, hence resulting in sparse data and poor recommendations. Apart from the negative results, the ratings showed that 30.3% of recommendations were either very good or perfect. Although we cannot have full certainty, due to the few user opinions collected, the system is working as desired and, in our opinion, the negative results can only be attributed to the sparsity of data.
Description: B.Sc. IT (Hons)(Melit.)
URI: https://www.um.edu.mt/library/oar/handle/123456789/92245
Appears in Collections:Dissertations - FacICT - 1999-2009
Dissertations - FacICTCS - 1999-2007

Files in This Item:
File Description SizeFormat 
B.SC.(HONS)IT_Muscat_Karl_2007.PDF
  Restricted Access
12.85 MBAdobe PDFView/Open Request a copy


Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.