Hacker News new | past | comments | ask | show | jobs | submit login
Algorithmic Music Generation With Recurrent Neural Networks [video] (youtube.com)
67 points by rndn on June 24, 2015 | hide | past | favorite | 31 comments



The funny thing is that the only really good ones are the first few where they claim its just random noise. The later ones just sound like a crappy radio.

With images the technique works because we like looking at the dense artifacts. Millions of dog heads essentially copy and pasted onto any appendage that looks like it should be a head. It looks like drugs and overwhelms in the same way.

If you just take white noise and throw it through a tuned filter bank (which is in essence and in final effect what they are doing here) then you just get crappy audio.

The more standard and successful use of NN in composition is the use it on pitch series and compositional forms. Like feeding it all of Beethoven and then getting it to generate similar compositions. That's been going on for decades. You can do it with that kind of data.

But the thing about pop and electronic music is that the easily machine observable elements are not very interesting. Listen to the 4/4 kick and snare pattern in the video. Its boring as hell. (Other tracks can be just a kick and snare and they are amazing and we celebrate them as classics and play them for 20 years. Machines will never understand why)

What's great and essential are things like spatial relationships between elements in the mix: how does the surge of the compressed synth/guitar cause the beat to tumble outward and stir you up ? after a series of peaks in a synth melody then the next time it pulls back creating a space that pulls at your heart strings. you create a negative space that the listener goes into. playing with listeners expectations based on what songs, conventions and tropes they already know and respond to.


The image stuff works because we have found a way to model a good prior for it : convolution layers are basically enforcing some positional invariance and locality constraints on what our model believes the world looks like. Without this very strict prior, image recognition with neural networks just wouldn't really work.

We haven't found a way to enforce a good prior for temporal data like sound yet.


I realised a while back a lot of computer music is really computer science music - it's people who know something about computers but not much about the art of music, playing with relatively trivial algorithms to create music-like results.

There's also the academic musical equivalent - music professors using stock faddy techniques like serialism or (currently) number and group theory.

It's not that this is an impossible problem. It's more that the set of people who can code machine learning algorithms and understand music theory and are creative enough to invent new algorithmic techniques and to create more-than-listenable music is incredibly small - double figures, if that.

So progress in non-trivial computer music has been incredibly slow. The DSP side has been far more successful, because DSP is - in most ways - a much simpler problem.


Music is (dis)harmonies over rythmic patterns. There isn't anything inherently artistic about humans that computers can't replicate with time even the ability to compose an original song. That is besides the lives of humans and their appearance and history which is important but not the only factor.

The irony is that musicians are actually striving for, but failing at, reaching the perfection level that computers have.

And so for computers to sound more human like they have algorithms that make them more "sloppy"

What composition algorithms lack is not the ability to compose like humans but a life that will give them angels and a story.

Then again a lot of music is really formulaic anyway and computers are used for most of it. There is nothing in a few years that will hinder some sort of computer star to be born. But it's probably never going to connect with us the same way another human can. Not for now at least.


I think that's a good example of what I'm saying - just because you don't understand the details doesn't mean professional musicians and composers don't have much deeper insight into music than you do.

If you think music is [list of numbers] that can be made more "human" with a bit of timing randomisation, then of course it's all perfectly straightforward.

In reality there's rather more happening.

>What composition algorithms lack is not the ability to compose like humans but a life that will give them angels and a story.

No, the music basically sucks as music. The number of people willing to listen to it voluntarily without being paid to - usually as students or academics - is vanishingly small.

The story part only becomes relevant after that problem is solved.

And while it's true that music is formulaic, it's also true that computer music hasn't yet worked out how to copy all the details of the formulas - never mind produce original and memorable new formulas from scratch.

The best formula copier is probably Cope's EMI, and that sounds exactly like what it is - a slightly confused cut-and-paste cliche machine, not a human composer with a point to make.


I think you are having it the wrong way around.

Music becomes meaningful in the listeners mind, and the things that make it meaningful is both that it's formulaic (structure) and whatever the performer instills in the listenter.


"What composition algorithms lack is not the ability to compose like humans but a life that will give them angels and a story."

Quote of the month for me. Bravo =]


I somewhat agree about computer music, though it appears to be an extension of the process driven composition that has been part of western (art) music for a while now (e.g. modulation).

Sometimes I wonder if linguistics has more to offer composition than algebra (speculation, as I know next to nothing about (non-CS) linguistics).

> It's not that this is an impossible problem. It's more that the set of people who can code machine learning algorithms and understand music theory and are creative enough to invent new algorithmic techniques and to create more-than-listenable music is incredibly small - double figures, if that.

Are you pursuing something like this? Or know anyone who is? This is one of my main interests (alongside better interfaces for composition and well, making music). I'm actually back at University for my second degree to study this sort of thing.

If you have a blog or anything I'd be interested to find out...


The reason for it has a lot to do with how Academia works. You need to write papers and produce innovative works. If you don't then you won't get funding or advance your career.

> or (currently) number and group theory.

That was Xenakis back in the 1950s !

Xenakis is the exception. He's really the foundation of academic computer music and his music is amazing and moving and his compositional concepts are still being hacked out by music programmers today.



It helps but it's not necessary. Schmidhuber obtained state of the art performance on MNIST with a very vanilla deep net (no convolutional layer, no pooling, nothing fancy, just fully connected sigmoid units)


MNIST is really easy, it's not a useful benchmark any more.


Sure, but the fact that you get state of the art result without a convolutional prior ought to at least support the argument that the prior is not necessary


You can get something like 5% error rate on MNIST with a well tuned linear classifier, it's not comparable with ImageNet. Note that the computer vision techniques used before convnets used things like SIFT features, which are another way of imposing a (sort-of) prior. I do believe that some sort of strong prior is necessary for the problem.


I think that music is not just the sound and not just the observable elements. It literally moves your emotional system around. We get shivers, we get horny, our mind shuts down and we get swept away in memories. That's not because of any specific arrangement of machine-observeable sound objects.

> on what our model believes the world looks like

Its not even doing that. Those images are just simple 2D fields of pixels with color data.

It has no idea about any worlds. The spatial cues are way off, but we understand art (impressionism through messy expressionism, glitch) so we forgive it. And because they chose images with puppy eyes and we like puppies. Take away the animal elements and everybody would say the images were boring crap.


> Machines will never understand why

humans don't really understand why either, if it didn't stop us, why should it stop a machine?


> Its boring as hell. Machines will never understand why

That statement seems to be overly strong. Perhaps we are far from having a machine that can figure out high level aspects of a song on its own, but I don't think it's implausible that with some more guidance (for example by learning also loop arrangements and filters instead of only the waveforms) these neural networks can potentially create quite interesting music today (especially in the EDM/IMD genre). This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that it's impossible or that it will always be of poor quality. People have said the same when synthesized music arrived, that it lacks human aspects etc., and now it's has a high cultural significance, even though it uses things like auto-tune and consists of super clean loops.


> for example by learning also loop and sample arrangements instead of only the waveforms

This is what I mean - its not about the loop and sample arrangements.

I've been doing exactly that since the 90s and I've made some great twisted computer generated stuff. Logical or predictable methods will always result in logical and predictable music. Its all about setting up code environments so that you can generate accidents and mutations and capture them. Its all about reacting and capturing it and putting it on wax (historically speaking).

But that doesn't mean that a computer can understand why or even recommend what is amazing. We humans don't even know why some things are so great. A major thrill is finding some sound that is so twisted (ill, stoopid, sick) and totally bypasses the rational mind, shuts your thinking down and you get a big smile and start jumping up and down acting like an idiot.

Then somebody else copies your track, then it becomes a style, then it becomes a cliche, then beatport is filled up with boring copie, then it becomes a sample set that people can buy, then somebody makes an app that can auto-generate that style and then they claim that computers are making music.

But they are just playing it back, just like tall the human copycats further upstream.

And the entire network of software, creators, audience and cultural is what we call music.

> This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that it's impossible or that it will always be poor quality.

That's what I've spent a lot of my life engaged in. It often makes great music, but the machine cannot understand why. You'll need strong AI for that and it will need to have tensions, depressions, a body, chemical feelings, sex drive, longing and a strong psychological need to be lost in song. Then it could say "David, I think I've found a song you might like."


This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that

What is scary is the willingness of some people to accept dubious randomly-generated artifacts as "art" and as evidence of consciousness and intelligence. You should not trivialize actual human intelligence and creativity simply because you want to believe in strong AI.

GP post makes a very good points and (like usual, it seems) they get completely ignored in replies.

Setting up some 4/4 beat is easy. And you don't need an AI to throw is a randomly arpeggiated chord in there. It will sound okay-ish. Maybe it will even sound better than the stuff in this video. But that doesn't mean it is equivalent to proper music composition or that anyone would actually listen to that stuff outside of a technical demo.

BTW, if you want to see real procedurally generate music that is recorded by musicians for real listeners (as opposed to by CS grads for other CS grads) you might want to take a look at Karma: http://www.karma-lab.com .


I was actually referring more to the general public and not to people with highly sophisticated music tastes. This is a first cultural world problem, so to speak. The vast majority will probably be completely happy with an app that composes them their infinite personal soundtracks based on their Facebook profiles (similar to how Ray Kurzweil suggested that we may have chat agents that fool most poeple very soon). That might be a harsh reality to people in the first cultural world, but it wouldn't be the result of the general public wanting to believe in strong AI, but the music would actually give them the same satisfaction they receive from manmade music (or even more due to the personalization). I believe this is a somewhat plausible prediction and claims like "computers are incapable of human emotion and ingenuity!" will not change anything (except for your mood).


Are we discussing AI or the supposed lack of music taste in general public? Again, saying "people are stupid" does not demonstrate a particular AI is smart of interesting. And anyway, people are smarter than they often get credit for.

> the music would actually give them the same satisfaction they receive from manmade music

You believe this because...? None of the generated tracks I've heard so far is anywhere near the kind of music people would listen to for the sake of the music itself.

Some people do listen to music to drawn out distracting conversations around them. Perfectly sensible thing to do. However, in those cases music can be replaced with white noise of nature's sounds. This does not demonstrate that music is equivalent to noise.

...

The problem with such AI demos is that they lack a particular purpose (i.e. success criteria), use existing material for training and do not go a step beyond that material. You can take a track and slow it down by a random fraction to get a different pitch and tempo. Viola. "New music." Procedurally generated. But that does not mean you've composed something. You've used a randomized algorithm hand-crated to produce a particular effect. It has no generative power.


I've already given the example of auto tune, loops and synthetic sounds. People care surprisingly little about authenticity and the top charts are telling about the intellectual preferences of the masses. Perhaps an RNN could even be trained with recordings of human emotional responses like heart rate and goosebumps. It's not just about changing the pitch here and there, but learning deep features about what makes music interesting to listen to.

Music doesn't need to be as coherent as a text (in fact some space for interpretation is often preferred), which also increases my confidence about this prediction.


I think I like your musical taste.


I'm not 100% up to speed on my AI, but this sounds about like what you'd get with random variations on a signal, where the neural net is the "which sounds like X" filter, and picks one of the two to survive. But that would be using both some form of a genetic algorithm (details TBD?) and the neural net as the checker. But is it?

If they aren't doing it that way, I'd be interested in hearing how it's evolving the signal in that given direction - and also how that filter works (what libraries does it use?).

Sounds like it hit some sort of local maxima, so this system won't ever produce the original song, but something a percentage of the way toward it.

I'm a bit more interested in algorithmic composition, but this could be interesting if trying to blend genres. For a long time I've wanted to build a program that could produce essentially an infinite song morphing between genres with lots of tunable parameters.


It would be interesting to know how novel those sequences are (obviously, the outcome would be far less impressive if what we hear is basically a looped, noisy sample of a song that already exists).


Not much information in the video on how this was achieved, but a quick search for "gruv algorithmic music generation" returns the following: https://sites.google.com/site/anayebihomepage/cs224dfinalpro... . Extract:

We compare the performance of two different types of recurrent neural networks (RNNs) for the task of algorithmic music generation, with audio waveforms as input (as opposed to the standard MIDI). In particular, we focus on RNNs that have a sophisticated gating mechanism, namely, the Long Short-Term Memory (LSTM) network and the recently introduced Gated Recurrent Unit (GRU). Our results indicate that the generated outputs of the LSTM network were significantly more musically plausible than those of the GRU.


Another promising field is RNN applied to TED talks: youtube.com/watch?v=-OodHtJ1saY


I think it would sound better if we thought the neural network to play notes and music theory.


This is not music. Music is not simply organised sound. Music is a cultural practise.


This is not a cultural practice?


Source code link?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: