So before Sandy landed and caused chaos in the city, I went to a monthly music hackathon. Working on the weekend after a few too many cocktails in Chelsea might not sound like great fun, but I can tell you it’s a really great event.
The concept of a hackathon is to ‘hack together’ some code or an application in a few hours. It’s a cool way to explore some ideas you might have had on the back burner but never had a chance to code up, or some stuff that doesn’t really fit into your thesis/grant (here’s the obligatory wikipedia link). Hackathons are regularly used or hosted by Facebook, Spotify, The Echnonest and Google for getting quick ideas tested fast – and sometimes failing fast too!
Our plan at labROSA was to port a load of Dan Ellis’ matlab scripts over to python. For non-computer scientists, this basically means that it will all be available without an expensive matlab license and can be used to foster more research.
If you’re a computer scientist, you still might not think it’s too exciting! But I can assure you, there’s a LOT of cool stuff on Dan’s page for manipulating audio, from audio fingerprinting (a la Shazam), aligning MIDI scores to audio for automatic ‘page turning’ and rehearsal for musicians, automatic beat tracking, phase vocoding signals to be made faster/slower/higher/lower in pitch without distortion, amongst many other things.
So, at some point (stay tuned…) all of the above and much more will be free to researchers! However, to market this work, it was decided we needed something shiny to show off, especially for the end of day presentations!
I decided to see if I could use some of these scripts to automatically generate ‘gear shifts’ in pop music. A gear shift is basically a really cheesy key shift in a pop song where the chorus is repeated a semitone/tone up to add interest to the tune. It’s a great way of adding an extra minute to a song, and literally ‘lifts you up’ just as the song is becoming dull. They’re a staple for just about any X-factor christmas or Westlife track but the best example I could find is Whitney Houston’s I Will Always Love You (skip to 3 minutes)
Boom! What a great floor tom hit. So, my plan was to automatically ‘gear-shift’ any song. Then any song can be made 20% more awesome! Turns out it’s quite tricky, but you can do a pretty good job using some of Dan’s code. I first extracted beat-synchronous chroma features (read as: description of pitch evolution at the beat level) and used these to automatically find the chorus. Below is a self-similarity matrix for each beat, so pixel (i,j) represents the (cosine) similarity between beat i and beat j.
Dark colours are high similarity, and I smoothed the matrix in the top pane to highlight long-term similarity and allow some local dissimilarity. Then I looked for strong diagonal stripes, which in theory represent large repeated sections (such as a chorus). Finding these is really the tough part, but in red I’ve highlighted the best candidate for this song (I biased it to prefer beats near the end of the song).
After this it’s pretty simple to grab this section of the audio, fade out before, phase vocode the detected chorus up a semitone, add some compression for drama and, viola!
Pretty neat huh?! Sure it’s not perfect, the vocals get a little chipmunky as it’s already quite high register, but that’s the beauty of a hack!
Stay tuned for the release of some cool python code to be released to do phase vocoding, structural segmentation (finding choruses) etc in the near future.