Music Information Retrieval

MagnaTagATune is a data set with almost 26 000 of 29 seconds audio tracks. Tracks are sampled with sampling rate 16 kHz, so bandwidth is limited to 8 kHz. Every sample is annotated in 189 categories with binary value. Dataset has annotation in CSV and MySQL, and some Python scripts available (I didn`t use them, I wrote my own available here ).

And here is some simple example of using my scripts:

    # Path of annotations, in this case folders with audio tracks
    # should be placed in /home/user/Magnatagtune/
    filename = '/home/user/Magnatagtune/annotations_final.csv'

    # Open file with annotation
    file = open(filename, 'rb')

    # Creating CSVParser object
    mangaCSV = CSVParser(file)

    # Getting names of songs in singer category. Return list of string
    print 'Files with singer category'
    print mangaCSV.printFilesInCategory('singer')

    # Getting all tags of track
    print 'Categories of burnshee_thornside-rock_this_moon-01-bad_bad_luck-117-146'
    print mangaCSV.printCategoriesOfFile('burnshee_thornside-rock_this_moon-01-bad_bad_luck-117-146')

MagnaTagATune is a nice dataset because of quite good annotation and large number of tracks. It can be used not only for genre classification but also mood classificationinstruments classification/detection.

You can download whole dataset from this place

Music Information Retrieval

Monday, 2 June 2014

Audacity

Saturday, 31 May 2014

The MagnaTagATune Dataset