Hacker News new | past | comments | ask | show | jobs | submit login
Generating music with expressive timing and dynamics (tensorflow.org)
126 points by iansimon on June 29, 2017 | hide | past | favorite | 22 comments



It's refreshing to hear generated piano music that isn't either strictly metrical or entirely freeform, but with patches where you do get a somewhat natural sense of rubato and sensitive dynamic shaping. It's sort of convincingly improvisatory. The constantly shifting harmonic idiom is disorienting in a not very pleasant way – the worst kind of Chopin + Ligeti mashup – especially when you raise the temperature. It would be interesting to use period/style-specific training sets.

To my ears the 5:00 clip does have a larger structure, there are clearly extended passages of building up to and ebbing away from large climaxes, where you get a real sense of sustained intensification, but of course if you follow the detail everything is built up from lots of fleeting and unrelated ideas.


> "It's sort of convincingly improvisatory."

I'm not sure if it was the dynamics specifically, but it was clear to me that A was human. Within 30 seconds I was so sure, I hit pause and loaded the answer to see if I was right (I was, and I'm likely the worst pianist on these forums and only a casual fan of music that falls into the 'classical' genre.)

Here's[0] a fabulous physics paper that analyses the 16th notes by a studio drummer widely considered one of the best in his field. IIRC, the paper mentions he couldn't record with a click because it'd throw him off. That being said, the quality of the recording didn't suffer (his 2nd take of the track was more than good enough for the rest of the musicians to record against). So his own 'internal metronome' was more than good enough. The interesting thing wasn't that his syncopation was incompatible with a click track, but rather the skew which evolved throughout each phrase had a mathematical model that fit well against it. The study compared his recording against a corpus of user submissions of the same track and all of these drummers universally followed a similar set of dynamics. So presumably all humans (or at least, all western drummers who elected to submit their recordings) have that same skew intrinsically.

It's interesting whether it's a byproduct of culture (like an accent) or a feature intrinsic to humans. In fact, that itself would be an interesting study -- compare the patterns of a traditionally schooled western jazz drummer vs a tribal African drummer with vs an Indian tabla drummer. The end of the paper suggests additional avenues to explore, but who knows, maybe soon drumming will be 'solved' ?

I'm totally with you on seeing how it would do training against a specific set of recordings from a specific region and/or era. The results would be terribly interesting ! Or training just against some particular virtuoso like Gould on Bach or Horwitz on Chopin.

As I understand it, there are basically just a handful of songwriters out there (Shane McMcAnally is a prime example) who write songs for the major country-pop artists. If you have a listen to this[1], you can really see how similar each song is. (This isn't exclusive to country music - the 90s pop I grew up loving is pretty much the same, as demonstrated by Rob Paravorian[2].) There's probably a lot of money in automated songwriting for Katy Perry & her entourage. Startup idea for any of you kids.

IIRC, there's a startup which is already using pinterest, tumblr, and more obscure sites like lookbook to analyze and generate trends for clothing and interior design which design houses can pay semi-nominal fee to gain access to. H&M is great pumping out high-street fashion copies within a season, but imagine being able to actually beat Tom Ford to market.

There are also interesting sociological implications for this. The Culture of Chess changed with BlueGene. When I first read about AlphaGo I was floored. (I mean really. I had previously thought would be intractable within my lifetime due to the huge configuration space.) As we see these 'good enough' models emerge, this has wide implications on human culture as a whole.

I wonder how it will affect the value of artists (in any genre). An ex of mine who hated basketball (this was during the Kobe/Paul Pierce days) still managed to recognize the genius when I showed her some Michael Jordan clips. Certainly an artist in his craft. I'm not a fan of Lady Gaga[3] but when I saw this performance I could immediately see a significant amount of talent. Walter Murch is an absolutely amazing film editor, will he be reduced to a Final Cut Pro plugin? If I manage to get my hands on the all-22 recordings (for every NFL game, there's an overhead camera which records the whole field to let coaches analyze their opponents) of every American football team, can I out-tactic Bill Belichick ?

==

[0] journals.plos.org/plosone/article?id=10.1371/journal.pone.0127902 (Seriously, it's a fantastic paper.) [1] https://www.youtube.com/watch?v=FY8SwIvxj8o [2] https://www.youtube.com/watch?v=JdxkVQy7QLM [3] https://www.youtube.com/watch?v=oP8SrlbpJ5A


> but it was clear to me that A was human

I believe you're thinking of a different project/article that was on the HN front page a week or two ago (but which I can't seem to find).

This article doesn't have any "which is human" bits (and the results here are a lot more impressive than the one you're thinking of).


I'd never seen the paper on Porcaro's drumming (or indeed really found anything as deep as this, I've done everything with my own very simple analysis), and it's fantastically detailed and looks very interesting. I'm a music tech teacher and producer, and this is an ideal bit of information for me to take in to help teach people how to make their sequencing more convincing. Thanks for posting it.


It seems that this model does not have any notion of "cadence" (the punctuation in musical grammar, given by harmony and tonality). The "expressivity" must be correlated to the harmony grammar, else it does not make sense. Unfortunately the samples in the article do not sound very good to me, and I am pretty sure that it is because of that.


This is stunning! Great stuff.

Since the input and prediction is a single sequence, did you experiment with beamsearch/stochastic beamsearch decoding (maybe with additional diversity criteria)?

I found that even simple models (markov chains) got a big diversity boost with a stochastic beamsearch - it might avoid the problems with low temperature repetition that could happen in a standard beamsearch. However, my music models are much, much, (much) worse than this, so my relative improvement might be related to that.

Similarly, I am finding really nice results in text (RNN-VAE) with scheduled sampling, it might be worth experimenting with.

I am amazed at how good this next-step sampled output is. The above ideas might just hurt the result, I am having a hard time imagining how it could be better.

What soundfont/midi rendering package is used for this? The piano sound is really rich.

Looking forward to hearing what creative things users will do with this model.


Hey Kyle, we didn't try anything more advanced than next-step sampling. You probably have a better sense than I do how much improvement such techniques are likely to yield. My unfounded suspicion is that we're close to the limit of generation quality from this dataset, and so I'm most interested in trying to gather 10-100x more skilled performances, one way or another.

There's also no consensus on whether the high- or low-temperature samples sound better. I've heard both opinions from several people.

Sageev did the final rendering, not sure what he used but I'm pretty sure it was nothing too fancy.


A bigger dataset of MIDI with velocity information and performance timing would be really, really great.

High temperature versus low is tough to compare - I find that sometimes low temperature seems better, then I change the random seed and my opinion flips.

Same for stochastic versus deterministic beam search, length/diversity scoring, and so on. I have been meaning to blog on this, will send it your way when I get it posted.

For character text, stochastic seems nicer broadly (maybe due to limited size of markov space, see [0] deterministic vs. [1] stochastic) but for music it depends on the representation I use. However at least in this cherrypicked example, I find the repetition of the deterministic beamsearch hilarious even though it is "worse".

Interesting, I will have to ask him what it was. With that render, at least my bad samples will sound prettier.

Great job on the model again!

[0] https://badsamples.tumblr.com/post/160767248407/a-markov-arg...

[1] https://badsamples.tumblr.com/post/160777871547/stochastic-s...


I think the choice of piano really sells the quality of the result. Musically it's not that great since it still sounds like random noodling. Much better than any other implementations that I've heard.


Could it mean that you could generate music for games that would follow the action and help build up tension?


This is already done in a lot of games. But those are precomposed parts that are dynamically morphed into each other when action changes.


Thanks, it make sense to do so. Do you know it they use use this approach only in game with a scenario or also in multi-user settings? Because in one case they can use static triggers where in the other case they would need to measure user's sensation or stress. I mean the music could be more intense if you are in abttle and your armor and health are quite low.


that first example is jaw dropping. its just like what good musicians do when they are noodling. damn. well done! probably the best results i've ever heard for this type of effort.


I'm curious how many tries it took to get that. I've tried chopping up samples from piano music using onset detection and then recombining samples programmatically. The results were more interesting musically to me actually, but also not as reminiscent of a traditional classical / romantic piano piece.

So, this is probably the best RNN generated music that I've heard too but overall I'm still not extremely impressed.


I'm passionate about this subject so here's my take on it. Until neural networks can create music better than humans it will be nothing more than table talk.

Usually for something to gain any real traction it needs to solve a problem or do it better than current solutions. AI generated music does neither.

From a music nerd who loves programming and neural networks I find this stuff very interesting. But I feel that neural networks could be much more useful to composers in other ways.


Yeah, I make electronic music and I'm really interested in algorithmic techniques as an aid to the composer too.

I have python scripts that just generate many minutes worth of music in an instant, and then I comb through the result and cut out the interesting parts for further processing. It's a really productive technique and you hit on melodies and rhythms that normally a human wouldn't.


If AI is used as a tool to help composers become more productive it would be a massive hit. The ability to create new interesting stuff in less time is valuable.


If not too inconvenient, I'd be interested to hear what tools and workflow you use to do this.


When I hear this I just think of Apple's Drummer Track in Logic X and what sort of cool generative tools could be developed that can be trained on different source inputs and such. It's really exciting.


This can generate elevator music that will never repeat. I am up for that ! (Just getting into ML with Udacity, Coursera courses. This is just fascinating)


You don't need a neural network for that. https://www.youtube.com/watch?v=esRdmKYucIw


You may be interested in Brian Eno's Reflection:

http://www.generativemusic.com/reflection.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: