Please use this identifier to cite or link to this item:
Title: Multi-lingual LSA with Serbian and Croatian : an investigative case study
Authors: Layfield, Colin
Ivanović, Dragan
Azzopardi, Joel
Keywords: Information retrieval -- Case studies.
Information storage and retrieval systems
Latent semantic indexing
Natural language processing (Computer science)
Search engines
Croatian language -- Data processing
Serbian language -- Data processing
Issue Date: 2017
Publisher: Springer
Citation: Layfield, C., Ivanović, D., & Azzopardi, J. (2017, September). Multi-Lingual LSA with Serbian and Croatian: An Investigative Case Study. Third International KEYSTONE Conference, Poland. 155-164.
Abstract: One of the challenges in information retrieval is attempting to search a corpus of documents that may contain multiple languages. This exploratory study expands upon earlier research employing Latent Semantic Analysis (so called Multi-Lingual Latent Semantic Indexing, or ML-LSI/LSA). We experiment using this approach, and a new one, in a multi-lingual context utilising two similar languages, namely Serbian and Croatian. Traditionally, with an LSA approach, a parallel corpus would be needed in order to train the system by combining identical documents in two languages into one document. We repeat that approach and also experiment with creating a semantic space using the parallel corpus on its own without merging the documents together to test the hypothesis that, with very similar languages, the merging of documents may not be required for good results.
ISBN: 9783319744971
ISSN: 16113349
Appears in Collections:Scholarly Works - FacICTAI

Files in This Item:
File Description SizeFormat 
  Restricted Access
232.66 kBAdobe PDFView/Open Request a copy

Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.