A library for audio feature extraction, regression, classification, segmentation

beepbooptheory · on Dec 11, 2021

I was in the market for one these and ended up with yaafe [1], which is a little older, but has, IMO, a better api, more flexible output, and c as well as python bindings.

Also, the documentation is rather good, with links to the various papers for each algorithm. The above library, in contrast, is little impenetrable for me.

I'm using this with postgres and supercollider for more of an artistic project though, so YMMV.

1. https://github.com/Yaafe/Yaafe

Jugurtha · on Dec 11, 2021

>I'm using this with postgres and supercollider for more of an artistic project though, so YMMV.

Do you mind telling us more about this project?

beepbooptheory · on Dec 13, 2021

Its basically like a very granular sample database. I have audio frame data saved directly on some table rows, and other tables for all the different classifiers/analyzers, and then some other data that can group these things by detected segment. Then, using sclang's rudimentary Pipe class, I can query things directly from supercollider and directly fill buffers.

"Find me 36 segments that are probably a minor chord with a root in a, that sound a little more airy" or "Find 10 segments that are maybe a bass drum"

Its mostly just for fun/making music.

jaflo · on Dec 11, 2021

I used this for a personal project [1] a couple of years ago to shorten audio files without making them sound like they were cut. This is done by removing repeated sections like replacing two choruses with one. I initially wanted this to "resize" background music to match video footage I had, but it is kind of fun to just mess around with songs too (like those content aware scale picture memes, but to create the shortest possible audio).

I think for my use case specifically, the library was kind of overkill though and something like librosa [2] would have been enough for feature extraction.

1: https://projects.loud.red/snipsnip/

2: https://librosa.org/doc/latest/index.html

sidpatil · on Dec 12, 2021

> like those content aware scale picture memes, but to create the shortest possible audio

Are you referring to seam carving [1]? Your description of your personal project reminded me of it.

[1] https://en.m.wikipedia.org/wiki/Seam_carving

jaflo · on Dec 13, 2021

Yes! I believe Photoshop uses it in their implementation, though I am not sure: https://helpx.adobe.com/photoshop/using/content-aware-scalin...

terhechte · on Dec 11, 2021

Interesting, I've recently done a bit of searching in this space to find a project that would fit for an idea I had: I'd like to use a raspberry pi zero w to listen for our doorbell. If the doorbell rings, it should do something (e.g. send an sms or turn on a light).

I couldn't really find anything, does someone know if a project like this exists? For the one listed here, I'm not sure if it is fast enough to run on a slow device like the W? Also, would it be able to detect audio in a continuous stream from say a microphone?

rvense · on Dec 11, 2021

You're looking for a single-bit stream of information and very likely you can find it as an electrical signal inside your doorbell already.

I wanted to replace the sound my wireless doorbell made so I took the basestation apart and it was a very simple thing, with three chips: a radio (NRF51), a microcontroller (PIC) and a blob of epoxy on a separate board that was connected to the speaker. It took maybe half an hour of beeping and scoping to understand how the PIC and the sound maker communicated - in this case five pins to select one of 32 sounds, and one pin to trigger playback. I simply took the playback trigger pin and connected it to a small MP3 player module and moved the speaker from the internal sound maker to that.

If you can just attach wires to the button directly, it's even simpler.

Of course, if the object is to use a pi zero to do some DSP, this is missing the point. But there's a good chance it's the long way round if you want to solve the problem of knowing when somebody is at your door.

garblegarble · on Dec 11, 2021

Same here, what I want to do is detect my dog barking excessively at people/cats/birds on the street and trigger my curtains to close for a few minutes... I've already got a wired camera in there so processing the audio seems easiest technically, but I can't help but think it's a really crazy waste of CPU time (even though it will be good for my neighbours).

I'd wondered if computing peak volumes per second would be a good enough proxy, then trigger action if the threshold is exceeded more than n times in 15 seconds... certainly seems like it should be way less compute intensive!

achn · on Dec 11, 2021

Or, you know, just wire in the doorbell button and be done?

foo_barrio · on Dec 11, 2021

If your doorbell is electric and plays a recording of chimes, it can be very straightforward to implement this yourself. Just off the top of my head I use FFTs (fast Fourier transform) of a known recording of the door bell limited to certain frequencies, normalized etc and compare it to the audio stream. This can be done in real time without any hardware acceleration. You can also go a bit further and implement something similar to the shazam algo.

If it's an "analog" door or a buzzer it will be trickier.

spoonjim · on Dec 12, 2021

OMG dude. Your doorbell is already operated by a pushbutton switch. This is the cheapest and most reliable input device you will ever find