Not totally related, but I've thought for sure time that a similar technique could be applied for automatic subtitles synchronisation on media with external subtitles. The system would read the sentences, automatically match it to the relevant segment of the audio track and adjust the timing accordingly.
Turn this into an app for ipad with instruments and you have me as a client. Turn it into a plugin for Logic/Ableton etc and I would be willing to pay monthly for that. I have been wanting this for years.
I can play instruments and I know my music theory. Sometimes I am just looking for easier and other ways to express myself. This I think would be welcomed by many just like the vocoder was.
Thank you... your comment was very inspiring :). I'm considering to add a way to download the MIDI file, that's very low effort and would be of some use in the short-term.
Yeah that would be amazing. I am litterally working on a song where I would love to play around with ryhtm a little bit. As I do a pretty good human beat box I woule love to be able to explore a couple of things there.
As I said. I would buy it and I would probably turn it into a hosted service if I were you as you probably need the cloud to do some of the computation?
Anyway. If anything else I think it would have a future as a plugin or service. And for a lot of novice music enthusiasts it would be a great little ipad/tablet app.
Yeah that app needs work like quantization both for rythm and scales and so on. It could be a really powerfull tool I think.
Make a webapp at least, allow me to export some useful format for logic.
Then you can work on other ways to use the voice to inform the sounds and other things. As I said. Think about it like VocoderMidi instrument/sequenzer. Something really useful can come out of that IMO.
You have made my dreams come true. Wow, some of the most exciting things I have seen on HackerNews in the last decade have shown up just in the last year. Thank you for building this!
What did you guys use? Tensorflow? Any chance it will be Open Sourced so we can learn from it?
Thanks! it's mostly DSP, sound processing algorithms. The idea was to reimplement most of it using deep learning if we had the budget/chance — it was meant to be a prototype.
it says it in the article, it's mostly DSP with a bit of minor machine learning. Its also pre-TF. If you're so interested in it why did you not read it?
This is really exciting — as someone who spends far too much time messing around making electronic music, I was thinking about how cool it would be to have a voice-to-instrument converter. After all, if a computer can create the sounds of an orchestra or band with just your vocal beatboxing / humming chops and a sprinkle of machine learning, then that's all you need to make the next hit-song, right?
Well — not trying to detract too much from this field — but, that's where I think the expectations of grandeur, that some people will get about this emerging technology, are kinda wrong. There are a few issues I can think of that will need to be overcome (if even possible?) before this technology can be considered groundbreaking.
From the 'person' side:
(1) - Vocal range and ability >> This one is going to be most hard as most people can only sing within a couple of octaves, and even then, need quite a bit of 'tuning' help to identify the right notes / intentions.
(2) Musical theory >> Fitting into the above, if you don't have any musical theory knowledge (about what key you're supposed to be in, or about chords, or progression), then a machine learning process would have to fill-in all the chords and textures / build-up. But that really can only help you so much, unless you're just interested in the 'novelty' aspect of it.
From the 'machine' side:
(3) 'Bum' notes and 'quantisation' >> I noticed from the samples that the saxophone would produce flutter around a note, and also the timings weren't 'on the beat' — but this should be quite trivial.
(4) Expressiveness >> Right now, the instruments might be adjusting to the input velocities (loudness), but if you want a life-like sax sound then you'll need to produce a level of expressiveness that makes it sound realistic... This can be achieved with a mixture of better (available) virtual instrumentation, and also ML to figure out what the best intentions are of the input
I think the two big key factors are musical theory and expressiveness. With the former, I think anyone who has a good enough understanding of musical theory probably wouldn't have much of a use for this other than a novelty until it produces compelling, effortless results. This is because they're most likely to have some kind of instrumental knowledge (keyboard most specifically for virtual instruments) and music composition experience. The two groups this kind of tech would really help are a) very good vocalists who have no electronic composition experience but a good grasp of music theory — Michael Jackson is a good example of this as this is how he made his song ideas[0], and b) music producers / instrumentalists that want to 'fill-in' a lot of parts quickly by humming / beatboxing — but again, they can probably do this with a keyboard reasonably effortlessly. If the 'expression' is worked on a lot, then it will become more interesting as expressive effects are hard to get right without expensive hardware. [1]
As much as I appear to be 'bashing' this, I am reservedly excited — but can't help but feel if there actually will be a worthwhile market for the tech!
The fella behind imitone (https://imitone.com/) has been grinding away at the details of this stuff for a while. I haven't checked in on it for a while but I was a kickstarter backer. The last version I loaded up and played with was pretty great.
We studied imitone before working on Reedom — it's a great and inspiring project. The approach and use cases are slightly different. The main difference is the realtime: imitone mainly focuses on realtime, while we didn't see much value in that.
Our real goal was to allow the user to create complete songs, using loops created with Reedom. We had an iOS prototypal app for that, but we didn't get so far.
Oh cool, thanks for the response and explanation. For me, real time is a big deal: I work with adults with developmental disabilities, and anything that helps with having fun jamming with me and my musician friends is really interesting. I had fun combining imitone (or other inputs) with loops created on iOS with Figure. It's a really interesting space, and thank you for sharing your work.