Hacker News new | past | comments | ask | show | jobs | submit login
Making Music: When Simple Probabilities Outperform Deep Learning (towardsdatascience.com)
137 points by lxm on Sept 8, 2018 | hide | past | favorite | 44 comments



Having worked on procedural music for some time, I really can't see how simple Markov-style approaches like this are likely to have interesting results. You can certainly create sequences of notes that have statistical similarities to the notes in a training set, but we don't hear statistical similarities, we hear phrases and refrains and call-and-responses and whatnot - higher-order structures that markov models seem really ill-suited to.

It strikes me as analogous to generating text from an ML model trained on a bunch of short stories. A short snippet of the results may sound convincing, but obviously a whole story generated that way would be gibberish. I think the same sort of thing is happening in the example songs - any given chord change or three-note phrase may sound musical, but over the course of several measures it's not distinguishable from random notes.


I think it might, but not without much more complexity. As you point out we hear phrases and refrains etc. - you'd need to both generate phrases, and then generate structure by making use of and transforming (mutating, transposing, investing etc.) phrases.

Listening to his examples what struck me was exactly that: You could hear the outline of something, but it was killed instantly by the lack of internal consistency: What at first sounded like the outline of phrases were self similar, sure, but there was way too little outright repetition.

It also fell flat at capturing well established rules of chord progressions - I have the same impression as you: individual changes were ok, but even with my very, very superficial understanding of composition I know that certain progressions add tension and others release it, and, and pairs of chords that work well together sound just plain weird if they're not part of a bigger structure that also follows the rules.

I'm reminded of a program printed in Compute! or similar in the 80's that was meant to let anyone play music. It worked surprisingly well. The way it worked was simply to only allow note progressions that were likely to result in something that at least sounded "like music". But you still needed to understand rules for composition to go from something "like music" to something that sounded pleasant.

I feel like his example falls in that category: He's made it sound "like music", but it's something most people can beat by buying a book on improvisation that explains a few simple chord progressions and some very basic rules. Going from that to something that sounds like an actual proper piece of music is another matter.


Don't worry, RNN generated version doesn't even generate a snippet of music. Calling it music is being generous, at best it is noodling.

On the other hand, Markovian Models can segment and learn grammars. (Specifically probabilistic context free grammars.) The problem with these is overfitting.


>You can certainly create sequences of notes that have statistical similarities to the notes in a training set, but we don't hear statistical similarities, we hear phrases and refrains and call-and-responses and whatnot - higher-order structures that markov models seem really ill-suited to.

You can have markov models at various levels of granularity...


> You can have markov models at various levels of granularity...

This is neither here nor there. The article already applies models to two separate levels of granularity (chord progressions, and melody notes over a given chord). But for higher order structures the hard part is decomposing source music into data points that typify its structure - if you did that then applying a markov (or other) model to the data would be straightforward.


You can, but OP didn't.

And you need to understand what you're Markovising - which apparently OP didn't either.


OP built a whole song self similarity matrix and regularised on that, which forces the model to adapt this kind of high level structure.


Neither transition tables nor naive self-similarity searches are an effective way to model this domain.


What’s the best github repo to get started with wav file data generated music... how big of gpu mem too? 6?11?24?gb


20-30 years ago groups/artists like Autechre were already making generative music much more musical than this without sometimes even computers. The metrics in this article seem wrong, but even without any metrics the clips sounds like something a first-year student in Computer music could throw over the weekend in MAX/MSP. The result is bad from music perspective and the method is uninteresting from data-science/machine-learning perspective.


Two problems that come to mind when writing a good music generator: 1) What's the metric? Can we have a metric that's not based on human judgment? Imho the best idea is to have a parametrized generator that creates music within manually set bounds, that way we can have a goal to compare against (we need to vary the music enough so that it's new); you apply the length, the structure (draw the ups and downs of your music), maybe even what chords to use, etc. (how many parameters you find necessary), and then set how much variety you're looking for against your trained net; 2) Good music is often like a structure within a structure within a structure; it has internal links several bars away from each other that are out of reach for simple Markov chains; there are patterns and progressions on every level of music, and a slight change to that pattern can make a big difference; every abstraction level has its own logic (previous chord against the next chord vs. previous melody vs. the next melody, previous ornament vs. the next ornament), and if you omit one layer, the whole tower of music won't be in balance.


So, the guy just put randomly generate notes over a chord progression ... ? and then created his own metric stating that it sounds better?


It's a little more complex than that. The metric I've created is built on an existing literature of self-similarity matrices vis-a-vis music, though at this point I suspect it's been dead due to the rise of GANs networks for signal extraction. If you have any suggestions of your own on how to objectively compare generated elements, I would appreciate the opportunity.


I've been attending a lot of edm festivals lately. I can't help but think popular music has moved to become much more timbre based opposed to harmonic based due to the complex human subjectivity. Its much harder to model timbre to harmony than melody to harmony.


It’s not due to the complex subjectivity. It’s due to the medium, I.e daw’s which discourage complex time and harmonic arrangement in favour of encouraging massive sound design, via their UI’s


That's an interesting point but I don't think it explains why music as a whole is going where it's going: from melodies and harmonies towards a more general art of sound. After all DAWs are created to accommodate the need, not the other way around. One of the reasons why it's happening may be that harmonies are pretty much exhausted and most of music in this sense is just a reiteration of the past. Apart from some innovation in jazz and progressive rock music, both being in stagnation (i.e. practically dead), nothing else is new in terms of melodies and harmonies.


> After all DAWs are created to accommodate the need, not the other way around.

DAWs were created for this point. But for a lot of artists nowadays, their whole window into the music-making world is their DAW, and as such, the roles have somewhat reversed, the DAW serving as inspiration for the work and providing the limits of it.

I remember reading a paper a few years ago where they showed that just the color skin used for your DAW had an effect on the music you produced. Humans are fairly influencable.


> I remember reading a paper a few years ago where they showed that just the color skin used for your DAW had an effect on the music you produced.

I'd love a link for this.


I can't find exactly the one I'm thinking about but here are a few on this topic :

- a whole phd on exactly the subject of this debate: https://yorkspace.library.yorku.ca/xmlui/bitstream/handle/10...

- http://www.arpjournal.com/asarpwp/experiencing-musical-compo...

- https://www.researchgate.net/profile/Josh_Mycroft/publicatio...


The DAC was a tool used to edit sound.

Now the DAC is an instrument.

That's the difference.

From a classical producer's perspective, we could say that the 'new music' now is a function of production.

A slightly different view would simply be to say that the Engineer/Producer is the artist.

Someone made a comment about rock/jazz not evolving much ... well, you need people who can actually play instruments! It takes massive work, talent, skill. People who isolate themselves for long periods. I know being a DAC/artists requires work ... but it's altogether of a different kind.

I appreciate soundscape art, but not not as much as live music.

FYI though maybe Brian Eno et. al. were the first to really take advantage of the tech, I have to think of DJ Sasha's album, Airdrawndagger as kind of seminal. It was a fully a producers album.


The DAW isn't really an instrument. All DAWs are a combination of (tape-like) linear recording, with a sampling rack, a synth rack, and an FX rack attached.

The instruments are the samplers, synthesizers, loops, and FX.

The proportions vary - Ableton plays up the sampling, ProTools plays up the linear tape and FX - but they're all more or less the same product, which is a digital implementation of a late 1980s recording studio with project recall and much more powerful automation.

To use a DAW as an instrument the design would have to be opened up and made much more configurable and programmable. Products that do this a little exist (e.g. M4L in Ableton, which is descended from Max/MSP, which is used by Autechre) but they're always limited and/or clumsy to program and invariably not very popular.

It turns out most DAW users are very comfortable with the studio metaphor and only a tiny minority have any interest in exploring beyond it. They either find their way to modular synthesisers, which are a different kind of dead end, or Max/PD, which (IMO) are a bit of a nightmare to use (PD less than Max), or maybe one of the code based environments like SuperCollider, which are much more open but so user-hostile you need a good grounding in CS and command-line development to use them at all.

To date no one has made a code-based album that has had any mainstream success.


I'm not really sure what the definition of an instrument is here, but many electronic artists are using Ableton to perform live. If were willing to call turntables an instrument then we need to call Ableton one as well. Quote from the popular BassMusic producer GJONES:

"What do you use to produce & perform? I use Ableton Live for both. I use almost solely Ableton’s built in synths/effects in my productions. My live setup is a sort of DJ-style set, in that I mainly mix using 2 audio channels and then have several more channels of acapellas, drum loops, risers, 1 shots, drum samples, etc."

https://www.youredm.com/2014/02/13/feature-g-jones/


Even though you can use Ableton to 'perform live' this is rather raw. When I say 'as an instrument' I mean as the conduit to creativity. The primary creatives in music would have been songwriter/composer, lyricist, performer. Now it's moved into production and the mechanism is not a guitar or piano, it's a variety of digital tools upon which creativity/composition is performed.


I'm very familiar with the tech, and I understand my comment about DAW being an instrument is a little metaphorical ... but the Engineering/Production is now where the material creativity happens.

Obviously it's not happening in 'real time' - but rather, by tuning each bar, each phrase, applying different effects etc. hence they are 'playing that instrument'.

Songwriting, melody, possibly musicianship and definitely lyrics are kind of secondary. You can say anything in a song these days it doesn't matter.


"all models are wrong, but some are useful" George E.P. Box.

I like the way he presented his explanation, and his approach to finding a solution.

However, listening to some of the songs I would say that he needs to find a better model, or like mister Burns would say: "Smithers, continue the research!".


It's so unfortunate that the author didn't feel much the need of humility. He worked on some pretty cool stuff, it could have been great to present it differently.


IMHO, the author doesn't understand the subject domain deeply enough to address it.


Darn, I was hoping my 20 years of musical experience would give me enough domain knowledge. If there are specific gaps in my understanding of music theory, please let me know; I'd love to learn.


I'm sorry to have offended, but I stand by my statement. IMHO, you've conflated the guidelines of musical grammar for rules of musical composition.


Hi! The author here. Would love to see what that different presentation would've been so that the language of my next project would reflect a more humbling tone. Thanks!


Without any human assessment or blind testing, claiming one method outperforms another using an ex post facto unsupervised metric is not scientific.


The problem with algorithmic music creation is that compositional models are always going to be limited to baked in technical parameters. If you're working from analysis, your analyzer needs to be able to recognize a compositional technique before it returns any useful data about it. Which means it needs to recognize a wide range of techniques, even for pop music.

Is this project the result of a high school computer music assignment?


Hey guys! The original author of the article here. I didn't know my article was generating so much attention here. There are great complements, criticisms, and external points I don't have enough hands to address individually. But if you guys have specific questions, please send it my way. Would love to further the discussion in hopes of expanding the frontiers of my project! Thank you --Haebichan.


I really respect & enjoyed your post. I've been experimenting with similar ideas but in my case I've used multiple RNN & CNN trained for different aspects of the music. For example train one NN just to identify compatibility between bass lines and rhythms. One NN to generate chord progressions. One RNN to generate melodies on top of the chords etc. check it out:

http://treblemaker.ai/

https://github.com/shiehn/TrebleMakerDocker


If anyone is interested in taking these explorations further, Crowd AI has a long running music generation with AI challenge. Input to the black box you design is a single MIDI sample. And output is a generated MIDI file that extends and "riffs" on that initial piece. Win condition is based simply on which piece sounds better to the judges ;)

https://www.crowdai.org/challenges/ai-generated-music-challe...

Also check out the recent work from Google's Magenta team. Their MusicVAE seeks to model not just the instrument, but the expressiveness of the musician. With "style" emerging from the MIDI representation alone.

Latent Loops Beta

https://teampieshop.github.io/latent-loops/


I’ve been exploring some of these ideas as well. I’m almost done with my product, sign up at http://ngrid.io if you want to be notified when it’s ready.


would love to learn more about how your model works. Would appreciate the explanation!


Yeah trying to make a RNN learn about simple (but "complicated") priors is an exercise in how not to do deep learning.

Just think how big a model should be to understand self similarity in 2 different parts of a song.


Unfortunately a bigger more complex RNN learns the songs by heart and doesn't generalise. Beta-VAE units seem essential and they're not doing well either.


Link for the lazy http://popmusicmaker.com/


Aaand it's dead. Drat.

(500 internal server error when clicking on Generate Pop Music.)


Yea, the machine is on a T2large EC2. It just got too much server requests I think. Some haven't been following directions too, like giving empty requests for lyric generation, and for some reason that doesn't reflect programming, it's crashing my EC2.


Oh! I didn't realize I needed to specify anything before clicking "Generate". That could have been why.


No problem. And have fun! It was super fun to make this project




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: