Please use this identifier to cite or link to this item:
|Title:||Music recognition, analysis, and automatic tagging using spectrogram pattern matching|
|Abstract:||A music file is typically comprised of an audio signal and metadata such as the title, artist/s, album, and track number, among others. This metadata, which is usually called an ID3 tag is, however, not contained in every audio file available. Like all other forms of media, music files on the internet have a habit of being spread around without the necessary information to organise and search them appropriately. For instance, videos that can be streamed or downloaded from YouTube do not have such tags. This use case demonstrates a need for an automatic tagging system for files that lack the appropriate metadata. To counter this problem, this project aims to research and develop an automatic tagging system for any type of audio file, starting with a reference data-set of songs from various genres, all of which are correctly tagged beforehand. A given audio file’s signal is then matched with the ones in the reference data-set and a corresponding tag-set will be attributed to the concerned file if a close match is found. Otherwise the system will allow the user to input the tag-set manually, if known, and add it to the data-set. Already existing studies into image classification and audio matching using spectrograms were examined to establish and confirm a valid setup for the system. Other research into theoretical backgrounds for used technologies, such as SIFT and BoVW was also carried out, before the system itself began development. The study of the system on a dataset of 1000 original files which were synthetically modified into various test sets, yielded interesting results, and compared the advantages and disadvantages of two different feature models, VLAD and BoVW. Accuracy rates tended to favour the former, but both have their own strengths and were explored in some detail. Tests included changing distance measures, varying certain parameters, and using techniques such as spatial tiling. Tests were also carried out to determine a threshold value beyond which a file would be classified as "Not In Database", and this value along with other improvements were merged together into a final overarching test, which yielded an accuracy rate of 84.1522%.|
|Appears in Collections:||Dissertations - FacICT - 2016|
Dissertations - FacICTAI - 2016
Files in This Item:
|6.03 MB||Adobe PDF||View/Open Request a copy|
Items in OAR@UM are protected by copyright, with all rights reserved, unless otherwise indicated.