Hacker News new | past | comments | ask | show | jobs | submit login
Transcribing Piano Rolls, the Pythonic Way (zulko.github.io)
310 points by gcardone_ on April 11, 2014 | hide | past | favorite | 35 comments



The faster way of doing this:

    def fourier_transform(signal, period, tt):
        """ See http://en.wikipedia.org/wiki/Fourier_transform
        How come Numpy and Scipy don't implement this ??? """
        f = lambda func : (signal*func(2*pi*tt/period)).sum()
        return f(cos)+ 1j*f(sin)
is using the FFT.

What you want is the power spectral density in the discrete case, called the power spectrum. It can be calculated by multiplying the discrete Fourier transform (FFT) with its conjugate, and shifting. NumPy can do it. Here is an example: http://stackoverflow.com/questions/15382076/plotting-power-s...


I knew I was going to have this remark :) Now correct me if I am wrong, but I think the FFT (which computes the discrete Fourier transform) cannot replace the continous fourier transform in my case, because the optimal periods I find are non-integer values. In the first case, the holes are separated by 7.5 pixels. The FFT could only have told me that they are separated by 7 or 8 pixels, which is not precise enough. Same thing for the tempo, a beat corresponds to 7.1 frames of the video, and a FFT would have told me 7.

If someone knows a way to use the FFT to get non-integer periods (apart from oversampling the signal) I'll gladly change the code.


The maximum frequency you can detect is limited by your sampling rate, but there's not a limit on the precision with which you can break those frequencies up.

It's controlled by a parameter NFFT -- the PSD will compute (NFFT/2+1) values evenly spaced between 0 and the Nyquist frequency.

So say the frame rate is 15Hz and you compute with NFFT=2048, then PSD[970] contains the amplitude at 7.09Hz.

This was a really cool project by the way!


Also, it's not as widely known as the FFT, but if you know roughly the frequency of interest you can use the Goertzel algorithm to calculate a chosen number of bins around that specific freq and then pick the max of them to find the freq of interest, instead of when using the FFT having to calculate a bunch of bins using a large nFFT in order to get enough freq resolution and then discarding 99% of the results. Going further, compared to the original Goertzel, the Generalized Goertzel algorithm does the same thing but allows you to query non-integer multiples of the fundamental frequency: http://asp.eurasipjournals.com/content/2012/1/56


Thanks, I learned something. I will try it and amend the blog when I have time.


Forgot to say, great post! :)


There are a lot of parametric (as opposed to the nonparametric FFT) methods for tracking frequency, I'm not totally convinced they're applicable to this case, but I think they might be fun to try out. Maybe start here: http://en.wikipedia.org/wiki/Multiple_signal_classification


See a master at work making original rolls at QRS. http://www.youtube.com/watch?v=i3FTaGwfXPM

If was a fun place to see in the 70's after watching my father rebuild our player piano.


Interesting that they used computers to make them. It seems obvious in hindsight; player piano music is digital!


Also interesting that we had digital data storage, in the form of punched cards and tape, decades before digital computers.


Longer than that. The Jaquard loom was invented in 1801 and the player piano was first demonstrated in 1876.


excellent to see the Apple //e [1] http://youtu.be/i3FTaGwfXPM?t=4m28s


What a fascinating convergence of math, music and Python. Many people I meet who don't specialize in math but have taken university-level courses in it seem to remember the Fourier transform as a highlight, probably because of its many applications.


I love the abundance of Python. For those unaware, even the youtube-dl command line utility he used to download the video is written in Python.


And in contrast to what its name suggests, youtube-dl supports 150+ different services: http://rg3.github.io/youtube-dl/supportedsites.html


I was thinking the same thing - I had never heard of that tool but I will def. use it in the future


Very cool!

Relevant: Zenph makes "re-performances" of old piano recordings. They take a recording, do music transcription magic to get the exact timings and velocities of each note event, and then feed that into a player piano. So it's as if you are listening to the ghost of Rachmaninov sitting at the piano, as shown here: https://www.youtube.com/watch?v=eevzbV6Hkkk&t=28 (music starts at 0:28)

(I just visited http://zenph.com for the first time in about a year, and it appears that they've pivoted into a music education company.)


Interesting question - is the author's transcription a derivative work of the video? And if so, is he actually allowed to release his transcription into the public domain (without the permission of the author of the video)?


No, it's only derivative in the sense of process. The video lacks originality; for the musical notes it is merely a mechanical reproduction of the punched holes. Similarly, a photograph of a public domain painting is also in the public domain. See: Bridgeman Art Library v. Corel Corp., 36 F. Supp. 2d 191 (S.D.N.Y. 1999). At least this is the law in the United States, which is sensible; absurdity of other jurisdictions may vary.


It's nice to know our system accounts for cases like this. Thanks for the detailed info!


What if you tried to transcribe the music solely from Fourier transform of the audio source? I expect the piano has an abundance of harmonics, but there should be some way to distinguish them from the keys. Hasn't someone done it already?


i've seen NNLS/chroma referenced in a few places, like the chordify papers:

http://isophonics.net/nnls-chroma

Here's chordify: http://ismir2012.ismir.net/event/papers/295_ISMIR_2012.pdf

That conference has great references but unfortunately hasn't been repeated since 2012 http://www.ismir.net/proceedings/index.php


It's certainly hard

However, this case would be one of the best cases for it, it's a single instrument, and you could make a careful recording out of it


That was a lovely read, thank you so much for writing and sharing it.


Really fantastic hack. Now try transcribing with just the audio track.


That's a hard problem. If you have some material like that with a clear recording, the only good commercial solution that I know of is Melodyne, and he's not saying how he does it. In theory you just look for multiple peaks in the FFT, but this is much easier said than done.


i built a plogue bidule patch before melodyne rolled out "dna" and it is extremely difficult to get the optimal fft parameters to get an accurate conversion. i cant imagine an algorithm that would get it right from analyzing the sample would be any less difficult. ableton's and cubase's options are pretty rough too. i am a drummer though, i am just trying to make up for my ears.


My favourite blog post of 2014. Thank you for sharing.


I think this is a nice solution because it takes care of the hardware side of things by making use of a garden variety video camera.


This is beautiful, it's one good idea after another, good job!


This is really nice, thanks for sharing it with us.


So, so cool. I love posts like this.


Nice post, thanks for sharing!


fantastic. with your permission, i'd love to use this to demo python!


Yeah, sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: