This weekend I went to my first meeting of the North East Music Information Special Interest Group, held at the echonest in Cambridge, MA. NEMISIG is an informal get-together for everyone in the northeastern region to see what’s happening in the labs, the research that’s being conducted, and who’s new or moved on to greener pastures.
LabROSA was pretty well-represented, but there was also a strong turn-out from Drexel’s Met-lab, Dartmouth’s Bregman Music and Audio Research Studio, NYU’s Music and Audio Research Lab, and other individuals from the echo nest and other music research labs.
I also gave a quick talk about lyrics to audio alignment. Our basic idea is to take a capella audio (without instruments) and align automatically-scraped web lyrics to the audio using state-of-the-art speech recogniser software and use a few neat tricks to boost the accuracy and pair with the full polyphonic recordings. Work in progress, but I had fun putting this little demo together:
Here you can see an automatic alignment of the lyrics to the audio. In the top pane is the word-level alignment of the lyrics, whilst in the bottom pane is the phoneme (word unit) level alignment. It works pretty well! I’m hoping to scale this work up to a dataset of around 1,000 songs and write it up for a conference some place.