This weekend I went to my first meeting of the North East Music Information Special Interest Group, held at the echonest in Cambridge, MA. NEMISIG is an informal get-together for everyone in the northeastern region to see what’s happening in the labs, the research that’s being conducted, and who’s new or moved on to greener pastures.

LabROSA was pretty well-represented, but there was also a strong turn-out from Drexel’s Met-lab, Dartmouth’s Bregman Music and Audio Research Studio, NYU’s Music and Audio Research Lab, and other individuals from the echo nest and other music research labs.

Saw some really great stuff, including talks by Jessica Thompson on neurosemantic encoding of musical (and more general sound/audio) events, and a neat demo of robot musicians by Eric Schmid:

I also gave a quick talk about lyrics to audio alignment. Our basic idea is to take a capella audio (without instruments) and align automatically-scraped web lyrics to the audio using state-of-the-art speech recogniser software and use a few neat tricks to boost the accuracy and pair with the full polyphonic recordings. Work in progress, but I had fun putting this little demo together:

Here you can see an automatic alignment of the lyrics to the audio. In the top pane is the word-level alignment of the lyrics, whilst in the bottom pane is the phoneme (word unit) level alignment. It works pretty well! I’m hoping to scale this work up to a dataset of around 1,000 songs and write it up for a conference some place.


BBC Documentary – How Music Makes us Feel

Just a quick post – tonight a documentary about one of my research areas, music and emotion, was shown on BBC1. The film looks at why music evokes emotion, describes some of the most characteristic emotions humans experience when listening to music, and which musical techniques are used to draw out these feelings.

The film also features footage from CMMR 2012 (Computer Music Modelling and Retrieval) where I presented some of my work, which aims to discover which emotions are most easily described by combinations of audio, lyrics, and social tags. Find out more on my publications page.

The documentary itself can be seen on iPlayer here for the next seven days. Look out for a familiar face around 32 minutes in! (people outside the UK, you can view this video if you have access to a VPN….)

Music Hackathon, October 2012

So before Sandy landed and caused chaos in the city, I went to a monthly music hackathon. Working on the weekend after a few too many cocktails in Chelsea might not sound like great fun, but I can tell you it’s a really great event.

The concept of a hackathon is to ‘hack together’ some code or an application in a few hours. It’s a cool way to explore some ideas you might have had on the back burner but never had a chance to code up, or some stuff that doesn’t really fit into your thesis/grant (here’s the obligatory wikipedia link). Hackathons are regularly used or hosted by Facebook, Spotify, The Echnonest and Google for getting quick ideas tested fast – and sometimes failing fast too!

Our plan at labROSA was to port a load of Dan Ellis’ matlab scripts over to python. For non-computer scientists, this basically means that it will all be available without an expensive matlab license and can be used to foster more research.

If you’re a computer scientist, you still might not think it’s too exciting! But I can assure you, there’s a LOT of cool stuff on Dan’s page for manipulating audio, from audio fingerprinting (a la Shazam), aligning MIDI scores to audio for automatic ‘page turning’ and rehearsal for musicians, automatic beat tracking, phase vocoding signals to be made faster/slower/higher/lower in pitch without distortion, amongst many other things.

So, at some point (stay tuned…) all of the above and much more will be free to researchers! However, to market this work, it was decided we needed something shiny to show off, especially for the end of day presentations!

I decided to see if I could use some of these scripts to automatically generate ‘gear shifts’ in pop music. A gear shift is basically a really cheesy key shift in a pop song where the chorus is repeated a semitone/tone up to add interest to the tune. It’s a great way of adding an extra minute to a song, and literally ‘lifts you up’ just as the song is becoming dull. They’re a staple for just about any X-factor christmas or Westlife track but the best example I could find is Whitney Houston’s I Will Always Love You (skip to 3 minutes)

Boom! What a great floor tom hit. So, my plan was to automatically ‘gear-shift’ any song. Then any song can be made 20% more awesome! Turns out it’s quite tricky, but you can do a pretty good job using some of Dan’s code. I first extracted beat-synchronous chroma features (read as: description of pitch evolution at the beat level) and used these to automatically find the chorus. Below is a self-similarity matrix for each beat, so pixel (i,j) represents the (cosine) similarity between beat i and beat j.










Dark colours are high similarity, and I smoothed the matrix in the top pane to highlight long-term similarity and allow some local dissimilarity. Then I looked for strong diagonal stripes, which in theory represent large repeated sections (such as a chorus). Finding these is really the tough part, but in red I’ve highlighted the best candidate for this song (I biased it to prefer beats near the end of the song).

After this it’s pretty simple to grab this section of the audio, fade out before, phase vocode the detected chorus up a semitone, add some compression for drama and, viola!

Pretty neat huh?! Sure it’s not perfect, the vocals get a little chipmunky as it’s already quite high register, but that’s the beauty of a hack!

Stay tuned for the release of some cool python code to be released to do phase vocoding, structural segmentation (finding choruses) etc in the near future.

MIREX 2012

The results for the 2012 Music Information Retrieval Evaluation eXchange (MIREX) for chord recognition have been released!

There were two tasks this year, one on a dataset of 217 tracks by The Beatles, Queen and Zweieck (these tracks have been used extensively by researchers in the last  few years, they’re known as the MIREX09 datatset), and another where the test set is a set of 197 popular music tracks provided by McGill university (McGill dataset).

We submitted 4 algorithms to each of these tasks, and I’m happy to say that on the MIREX09 dataset, we came 1st, 3rd, 4th and 5th and on the McGill dataset we took the top 4 spots! Our basic pipeline is shown below:

We begin by estimating the tuning of the piece. Songs not in standard pitch confuse our feature extraction and could mean that the entire piece is transcribed a semitone sharp or flat. We then extract bass and treble chromagrams from the harmonic content of the audio, using a process known as Harmonic and Percussive Source Separation (HPSS). These features are then smoothed over the duration of 1 estimated beat and fed into our decoder, the Harmony Progression Analyzer (HPA), which is essentially a generalised Hidden Markov Model (HMM) with an augmented state space with hidden nodes representing chords, keys, and bass notes.

Being a Machine Learning-based system, HPA has the advantage of being receptive to new training data as it becomes available. A key question to consider then is how to train a model with varied training sources. Should one dump all songs together and try and learn a general model, or learn one model per musical genre? Is overfitting an issue with larger training sets? Questions such as this led us to train HPA in 4 different ways, summarised by our models:

  1. HPA_in (NMSD1) – An ‘in-domain’ model, trained on the MIREX09 dataset
  2. HPA_out (NMSD2) – An ‘out of domain’ model, trained on the 522 publicly-released tracks from the McGill dataset. These data are known as the Structural Annotations of Large Amounts of Musical Information, or SALAMI. (note that this dataset is distinct from the unknown test set used for evaluation)
  3. HPA_mix (NMSD3) – A mixed model, trained on the union of the MIREX09 and McGill datatsets.
  4. HPA_genre (NMSD4) – A model for each genre in the training set is learned, with the parameters being tied together via a Bayesian prior.

The results for each of these on the McGill dataset are shown below.

What do these numbers mean? Performance in chord estimation is conducted by summing the total duration of the correctly labelled chords and dividing by the song length (it can be thought of as the ‘average’ time time the algorithm knows the right chord). The two evaluations used in the table above represent two different ways of averaging the results over multiple test songs. ‘Chord Overlap Ratio’ is simply the average result over all songs, whereas ‘Chord Weighted Overlap Ratio’ weighs each song by its length, so that mistaking the chords for a very short piece isn’t penalised as much as it would be for longer songs.
Overall, the results are pretty good! We’re particularly pleased with NMSD4 coming out on top, as it’s a really cool idea about how parameters can be ‘shared’ between similar genres such as ‘Funk’ and ‘Funk Rock’. This means that information can be shared between these genres, even if we only have a limited number of training examples for one of them. However, we don’t want to share much information between these genres and Hip Hop, so a balance is required! You can see this idea pictorially here:

Each genre (outer boxes, number of songs in parentheses) is assumed to come from a hyper-genre (spiked rectangles. Does anyone know of a better name for these shapes?!). Similar hyper-genres can then share hyper-parameters via a Dirichlet prior over the parameters (see publications page for details).

This post is probably long enough now, and quite technical…I’ll just end by saying that although musicologists might not be particularly excited by 73% accuracy, I think it’s incredible that an automated, fully automatic system is able to recognise chords at all. It’s also worth remembering that this system can run over thousands of songs without intervention, which a human certainly can’t do. I’m hopeful that automatically generated chord sequences can be used to help people learn to play along with their favourite songs and maybe be used in higher-level tasks such as music recommendation (I bet some people have a preference for some types of chord sequences), harmonic audio fingerprinting, and automatic playlisting.

You can download (a simplified) implimentation of our system in various forms (open source MATLAB and VAMP plugins under Windows, Linux and Mac) here.

Impressions from one month in Gotham

As it’s been a while since I posted, I thought I’d give an overview of my first month in NYC. In short, I’ve been having a great time!

Work has settled down now and I’m getting stuck into some really exciting, challenging research. I can’t say exactly what I’ve been working on, but it’s a challenging problem in MIR and one which could have huge impact if it works out. Also all the guys and gals at labROSA (new website coming soon!) have been really welcoming. Everyone’s working on awesome projects, from classification of soundtracks and subsequent playlist generation to large scale cover song recognition.

I’ve also been visiting friends in Boston and have to say it was great to get out of the city for a few days. We went to MIT and Harvard; both have stunning campuses and I could well imagine working there some day (pictured is the AI lab at MIT). Also sampled were Boston cupcakes, beers and accents.

I’ve also been keeping busy finding out new music. I recently discovered Band of Horses through Rachel Kurtz. We went to see their album launch party in Brooklyn (kind of strange going to a launch party of a band you don’t know anything about, but why not). I’ve also just found out that I’ve got a ticket to the Global Citizen festival in central park this weekend, and they’ll be playing there (along with Neil Young, the Foo Fighters, and others)! Twice in two weeks…

I thought I’d end by listing some small things I’ve learnt about the city:

  • Going North/South is a lot easier than going East/West!
  • No-one cooks. Ever.
  • Food is big. I mean really big. A salad can constitute a meal easily.
  • Sometimes it’s just easier to put on an American accent and say ‘faucet’,'trash’ etc than the resulting hassle of having a British dialect!
  • I still don’t understand the difference between ‘credit’ vs ‘debit’ and ‘checking’ vs ‘savings’, or why sometimes at random I have to sign for purchases and other times now
  • Tipping is necessary everywhere, but the infrastructure isn’t really there to support it. A cheque comes for a group meal, from which each person has to work out what they from the itemised billing, divide up the tax, decide how much they would like to add for gratuity, and pay. Sounds simple enough, but most places will only split up to 3 ways, some want gratuity on the check, others only cash, sometimes there’s a maximum number of cards they take…complicated!
  • Clubbing is rare, but happy hours are commonplace =)

That’s it for now! Looking forward to lots of visits in the coming months, as well as celebrating my birthday next week!

The Colo(u)r Run 2012

Last weekend myself and two friends did ‘The Color Run’! In an attempt to make health and exercise more fun and rewarding, local governments around the US have been funding ‘Fun Runs’ of length 5k around major cities around the country.

The twist is, at every Kilometre checkpoint they have volunteers with powdered (edible, it turns out) paint of a particular colour ready to throw at you! The picture on the left shows us at 3k, suitably covered in blue paint :-) Despite getting up at 5:15 (run started at 8:30 in deepest, darkest Brooklyn) we had great fun! The video below should give you some idea of the good vibes of the day.


I was lucky to get a ticket from someone at Sarah’s work who dropped out at the last minute, as the event sold out in a few hours. Booking for next year’s event strongly encouraged! And finally, in case you were wondering what we looked like by the end of the run:

Orientation – Reno, NV

I’ve arrived! First stop was Reno, NV for orientation to the US. For those that don’t know (I didn’t), Reno is ‘The Biggest Little City in the World‘. With a population of around 220,000, it’s a small city by North American standards. It does have a ‘big city’ feel to it though, with casinos, high rise hotels and bars/clubs scattered about the metropolitan area. Why Reno? I was wondering the same thing, until Fulbright told me that the idea of these ‘Gateway Orientations’ is to expose us to an area of the United States that we otherwise might not get to experience. After driving through the city and seeing the University of Reno, Nevada campus, I can confirm that it’s a long way from NYC!

The goal of Orientation was to provide us with information regarding visas, US culture and politics, health insurance etc. These topics were broken up with some recreation though. We got to go to visit a native American museum and chatted to an expert on one of the native American languages. Above is a photo of the museum curator telling us about all the different tribes which exist in Nevada. After this, we got a story in their language, which I’ve uploaded to SoundCloud on the audio/video page of this site, along with a translation to English. Many thanks to Zane Peneze for letting me use this audio.

After this, we got a chance to visit Lake Tahoe, which borders both NV and CA. Had an amazing BBQ, swam in the lake and played beach volleyball, all to the background of this sunset! Amazing!

One other highlight I thought I’d mention quickly was a visit to the National Automobile Museum, where I managed to spy this amazing fire truck! Onwards to NYC….

US/UK Fulbright desintations 2012/2013

I’ve attached above a picture of the destinations for the UKUS Fulbrighters for this academic year. Blue points are Scholars (non degree-seeking professionals), whilst Red points show those pursuing a degree or taking a year out from studies (which includes myself).

Really interesting to see the distribution over states and areas in general. There’s an obvious hub around the Ivy League schools in the North-East, but it’s great to see some scholars heading to California, Texas and The South. The Midwest seems to be lacking visitors though – does anyone know of any good colleges in this area? Would be great to see a similar map for US scholars coming to the UK too if such a map exists!

Fulbright Orientation

Just completed my orientation for my Fulbright year at LabROSA (Laboratory for the Recognition and Organization of Speech and Audio) at Columbia University in the city of New York. You may have seen some of the things we got up to via my barrage of Twitter updates, but what I didn’t get the chance to get across so much was the wealth of talent that Fulbright have collated this year. Amongst the incredible brain surgeons, lawyers and hostage negotiators, I thought I’d share some info on some of the scholars I’ve had the honour of meeting over the last few days.

First up, Oscar Sharp is a director whose 5-minute film `Sign Language’ won a ton of awards, including the Grand Prize at the Virgin Media Shorts and the Grand Prix, Reed film contest.

Second, Sam Sharma is a Surgeon currently working in London with a passion for Deep House, DJing and production. I’ve posted a set below, but you can find out much more on his soundcloud page.

Finally, I’d like to draw your attention to Miriam Nash, a London-based poet who will also be working at Columbia University. You can read about some of the things she’s up to on her personal blog. The two of use have in fact already been brainstorming about ways we could collaborate on poetry in lyrics whilst we’re working in NYC.

Well, I hope that gives you an idea about the kind of talent that will be heading to the US in the coming months to foster cultural understanding between the US and UK. I’ll end with a quote from Senator J William Fulbright regarding the exchange program he set up in 1948 following the end of the Second World War which really sums up the idea of the program:

“The simple purpose of the exchange program…is to erode the culturally rooted mistrust that sets nations against one another. The exchange programme is not a panacea but an avenue of hope.”