Hacker News new | past | comments | ask | show | jobs | submit login
Python, Pitch Shifting, and the Pianoputer (zulko.github.io)
134 points by dzderic on June 10, 2014 | hide | past | favorite | 30 comments



Writing sound/music-generating applications is one of the most fun things you can do with your computer!

Here's a simple synth I wrote some years ago. https://github.com/rikusalminen/jamtoysynth/blob/master/src/...

It was originally intended for a 4k intro (ie. demoscene) which I never finished. The synth was written in x86 assembler using 16.16 fixed point algebra because the instruction encoding for grabbing the lower 16 bit part (AX) of a 32 bit register (EAX) uses very short instruction encoding. The old assembler synth was under 1k in size, uncompressed.

This version is written with floats and is written in easy-to-read C code, but it's essentially the same logic.

I also enjoy using the keyboard as a piano-like control device, like tracker software back in the day. Here's an excellent example of playing music using the qwerty keyboard: https://www.youtube.com/watch?v=3JQkW6BgUYU


To teach mysefl a bit more about x86 assembly I once wrote a speaker-clicking loop. It was very educational: when I first played A440, a person in the room spoke up and said "I have perfect pitch. That's 438Hz".

he was right. I had a delay bug.


That's crazy, can someone with perfect pitch really be that accurate?


Actually, it might not be as difficult as you think. Someone with a good knowledge of music theory and some ability with perfect pitch may well be able to recognize the difference with current A440 tuning and A435. Given that ability being able to recognize that it is between either and then guessing 438hz (and being lucky) doesn't seem too hard to believe to me. That said, I'm sure I couldn't manage to even guess within 100hz.


Yes. I later tested and he had 1hz resolution in that range(!!!)


That's about four cents, or a twenty-fifth of a semitone. I don't think I can even distinguish relative pitch at that level (although 15 cents I can do easily -- roughly the difference between a just major third and an equal-tempered one).


A MOD player was my first ever real x86 asm program - I remember discovering that trick too - store the 16.16 value rotated by 16 bits, then use ADD and ADC to increment it according to frequency. Good times!


Not remotely as cool, but I was working on a modular synthesizer lib [1] based on the js node audio API.

There's a small demo [2], based on an older version of the lib, which can be played using the keyboard.

[1]: https://github.com/zenoamaro/audiokit [2]: http://zenoamaro.github.io/audiokit/


The explanation and the Python code were just fine but I was 100x more impressed by his keyboard performance. I still hit backspace at least once per line.


FL Studio (a so-called digital audio workstation) has an option for using the typing keyboard as a piano keyboard. The layout looks like this: http://www.image-line.com/support/FLHelp/html/img_glob/qwert... I'll definitely try the layout as mentioned in the article, the playing looks very natural.

Not very hacky, but you can download the free demo of FL and have a go at playing with many synthesizers and audio samples. It's too bad my keyboard (and, I suspect, a lot of keyboards) doesn't support every possible key combination; a lot of even 3-note chords are impossible to play.


On a side note, Linux Multimedia Studio (LMMS) is an open source clone of FL Studio. Although its not on par with FL Studio's experience and features, its pretty neat and has all the essentials in place!


It may be a standard thing for all I know, but I think they nicked the layout from Jeskola Buzz[0], where I cut my teeth many moons ago.

Anyone who hasn't tried it should give it a whirl - it's got the hexadecimal-happy vertical sequencing of a tracker, combined with an insanely flexible modular sound workspace. I switched to FL 'full time' eight or so years ago, but I've never found anything as intuitive for sound design as Buzz.

[1] http://www.jeskola.net/buzz/


  factor = 2**(1.0 * n / 12.0)
Aargh, equal temperament! b^)


You were hoping for just intonation?


Funny you should ask. I spent a while a few years ago messing around with csound, synthesizing Bach chorales in a number of tuning systems. You could tell a difference, IIRC, even with kind-of-unpleasant synthesized tones. I'd have to go back and look up what I did, but I must have done both equal temperament and (some kind of) just intonation.


You can absolutely tell a difference! Probably more so with synthesized sounds, which don't have natural pitch deviations or complex harmonics. The beating you hear in any even-tempered dyad other than an octave is really obvious with sine or square waves, presumably especially so with the sustained notes in a chorale :)


There's a blues guy Willie McBride who has recorded blues in just intonation. It's kind of interesting to listen to, not for everybody, but if you're OK with indian and other non-western-classic music, it's worth checking out.


For anybody intrigued by this, the next step may be a Microbrute, bassStation, or maybe a MS20 mini, all for < US$500 street. Used microbrutes can be had close to $200.

http://thesynthesizersympathizer.blogspot.com/2014/03/buying...


Or go completely other way and get the most basic MIDI controller and play around with Pure Data, SuperCollider and other fun stuff. Not saying that would better, but might be more approachable by the HN crowd and probably cheaper too.


Cool, but seems a bit hackish to me. Wouldn't the "proper" way to pitch shift involve taking a Fourier transform of the sound and simply shifting all the frequencies by some factor? Wouldn't this result in better quality, or are there aliasing issues or something I'm not aware of?


It does seem a bit funny to do it in two stages but it's the standard way... it might seem more natural if you were to do the resampling stage first - resample by a factor of 2x and you have the same sound but an octave lower and twice as slow, then a 50% time-stretch gives you the original duration but maintains the new pitch.

The phase vocoder does use an FFT on each window internally, so that it can ensure the phases remain continuous when everything is merged back together. There are variants that let you monkey around with the FFT coefficients before the merge, so you can pitch-shift that way, but I believe when you stitch it back together you end with with the exact same artifacts as the two-step way. I think people have concentrated on perfecting time-stretching since pitch-shifting can be derived from it.

The problem with doing an FFT on the entire length of the sample is that shifting all frequencies would then simply speed it up as well as changing the pitch ;) Chopping it up into bits is key to separating the fundamental frequencies that we percieve as the general "pitch" and all the time-varying harmonics that we percieve as "timbre".

edit: what is maybe a bit hackish is the crude resampling here - when going to the trouble of building a phase vocoder at least some linear interpolation might be appropriate rather than just dropping/repeating samples ;)


Rate conversion in the time domain is fine, but the "proper" way would smooth the lower pitch tones with a low pass filter, not simply zero-order hold.

Furthermore, the "proper" way would pass the sound through an anti-alaising filter before creating a higher pitch tone. However, at 48kHz (as in the post), this isn't really an issue for audio.



I checked out the solution on github, but my results were not like advertised in the video. On my laptop it sounded like a dying cat of some sort.


It would be more helpful if you could describe the results more accurately.

The most common reasons in audio problems I've found are mismatching sample rates and a failure to meet the deadline of the audio interrupt, causing choppy or distorted audio. A mismatched audio format (ie. number of channels or bits per sample) might also cause problems.

Also, knowing what OS/Audio api backend is being used is vital information for debugging.


You are right, feedback should contain constructive comments, but I couldn't be bothered to go into further details when I wrote my response. Here is a short video that I made showing what it really sounds like on my machine:

https://vimeo.com/97857232

I can't affort to "nerd into" the actual problem right now. Hence the whitty and non constructive comment. Apologies.


I get exactly the same effect

EDIT: Looking into it further, I get that effect just loading up the provided .wav (which sounds fine in any media play) and not doing any processing on it, just trying to play it straight up.

EDIT AGAIN: Got it, the very low buffer size of 100 is casuing the problem. Using the default size and everything is working like a charm


That's either wrong sample rate or missing the deadline for the audio callback. My guess is the former.


Strange. Even with headphones ?


Awesome Pianoputer chops! And educational Python write up. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: