Please use this identifier to cite or link to this item: https://www.um.edu.mt/library/oar/handle/123456789/103818
Title: Identifying temporal trends based on perplexity and clustering : Are we looking at language change?
Authors: Boldsen, Sidsel
Agirrezabal, Manex
Paggio, Patrizia
Keywords: Computational linguistics
Algorithms
Cluster analysis -- Data processing
Document clustering
Issue Date: 2019
Publisher: Association for Computational Linguistics
Citation: Boldsen, S., Agirrezabal, M., & Paggio, P. (2019, August). Identifying temporal trends based on perplexity and clustering: Are we looking at language change?. Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, Italy. 86-91.
Abstract: In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the KMeans algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.
URI: https://www.um.edu.mt/library/oar/handle/123456789/103818
ISBN: 9781950737314
Appears in Collections:Scholarly Works - InsLin



Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.