Music and Machine Learning

fenomas · on Aug 24, 2018

> The music generated by wavenet clearly sounds like a piano, but lacks compositional structure that most people might be able to follow. I suspect a significant architectural change will be needed for music for reasons discussed in this article.

As someone who's worked a lot on procedural music, I think this is definitely true. I'm always surprised to see ML-based approaches where someone has just trained a system on a bunch of songs, and then hopes the system will produce music with a recognizable structure - even though all the training songs will have had (in general) different chord progressions, different numbers of voices or melodic lines at any given time, etc. Such approaches strike me as akin to training a system on a bunch of short stories and then hoping it will produce a new story with a recognizable plot.

It seems like it would make a lot more sense to remove these hidden dimensionalities, e.g. by annotating the source data with chord or structural information, or by training on lots of different melodies that all share the same chord progression, etc. But it's hard to imagine that with enough layers the network will eventually grok all these hidden details.

lwansbrough · on Aug 24, 2018

I believe the first company to create a very successful cloud based DAW will have the greatest opportunity to vacuum up this data. If producers were pushing real original music source data into your neural network, you basically eliminate the need to do any sort of waveform analysis. Everything turns into discrete numbers, which neural networks are really good at managing vs. the destructive noise of common music files. (120BPM music could conceivably be 2 inputs per second vs. 44100)

Edit: As a matter of fact, you don't even need a whole DAW, really. You just need to be able to read existing DAW files and give users a reason to upload them.

fenomas · on Aug 24, 2018

Converting sounds to numbers is certainly nontrivial, but the "ML-for-music" projects I've seen were generally already working with MIDI.

I think the deeper problem is that musical structure that's obvious to the listener (chord progressions, modulations etc.) are realistically going to vary too much across the training data for any ML approach to figure out.

jerrre · on Aug 24, 2018

I think a lot of DAW work is still done with wave-data not note data...

igotsideas · on Aug 24, 2018

Are you talking about the creation of Music or gathering the data of the Music for ML purposes?

scjody · on Aug 24, 2018

blend.io seem to be on the right track towards giving people a reason to upload their music.

gcb0 · on Aug 24, 2018

you are assuming every music out of a DAW is good.

8bitsrule · on Aug 24, 2018

This is a set of valuable, experienced reflections on the task anyone faces who wants to make music (let alone get a machine to do it).

Much to digest here, but I especially like this thought: "There are no words in music." A whole lot of people think that music is nothing more than a carrier they have to modulate with some important message.

bhrgunatha · on Aug 24, 2018

Enjoyed the article very much, especially the part about music not being a language and the dive into syntax and semantics, but this...

> The only way to recognise which pop song (from one popular training set) this score comes from is the lyrics. The melodic line has little similarity to the original recording.

...strikes me as plain nonsense. I can play that line on my guitar or keyboard and know immediately which tune it is even without the lyrics.

Am I misunderstanding something?

tommymachine · on Aug 24, 2018

The notes / rhythm in the particular score aren't accurate to the recording, except for the lyrics. Supposing if you played it as written there, it wouldn't be so recognizable. Personally, I don't much fancy calling something so inaccurate a 'score'. More like, a 'rough suggestion' or something. A score is supposed to be like a thing that someone like Hans Zimmer would write for an orchestra. Pop songs typically don't have scores involved in their production anyways, which may be partially responsible for the inaccuracy. But the real responsible party is the person who transcribed it wrong!

sacado2 · on Aug 24, 2018

Well, take any jazz standard. As you say, scores in jazz are at most a rough guide, not transcription of how the music should be played. Yet, if you take a score and make a MIDI player play it as it is written, any jazz fan will recognize the tune instantly.

For instance, take

http://i17.tinypic.com/4uneoft.jpg

Remove the title, add a mistake or two in the score, and give it to some jazz fan who can read music, he'll recognize the tune instantly, yet nobody ever played it this way, so blandly.

tommymachine · on Aug 24, 2018

They'll recognize the chords instantly, without even having to glance at the printed notes. In practice, these things are usually called "charts" or lead sheets (if they have a melody) because they chart out the changes, which you solo over. They usually are found in "fake books", where it's assumed that the melody transcription is relatively awful, but it's close enough that a decent musician can quickly figure out the actual tune from it (assuming they've heard it before).

You'll almost never hear any seasoned jazz players refer to a chart like this as a "score", unless they have a very very classical background, or are sort of making a joke about the quality of a particular lead sheet. Scores are for orchestras and films and things like that. It's strange to see people keep referring to these sh*tty transcriptions as 'scores'.

It's like calling Kraft Mac 'n' cheese "pasta". Like, ok maybe technically it could qualify as a pasta, but you don't really refer to it as that in practice.

microtherion · on Aug 24, 2018

In fact, they may find it easier to recognize that tune from this transcription, than from some of the actual recordings: https://www.youtube.com/watch?v=5cZG2WnXPgk

bhrgunatha · on Aug 24, 2018

I agree the transcription is awful, but would you say it's unrecognisable?

tommymachine · on Aug 24, 2018

taeric · on Aug 24, 2018

What about folks that get words in songs wrong? Quite a few folks would know exactly what song I meant when I said the one with the verse "There's a bathroom on the right" even though that is wrong to the point of idiocy.

That is to say, I think you don't give people good enough credit if you think a bad transcription will preclude many from being able to recognize something.

tommymachine · on Aug 24, 2018

A few misheard lyrics here or there is quite different from 9 / 10 notes having a different rhythm, as in the article's example.

taeric · on Aug 24, 2018

You'd be surprised. There is a reason most people can tell, just from the lyrics, what songs a parody is based on.

Similarly, a novice butchering a song on guitar is likely completely wrong, but people will still recognize it if the general shape of the song is preserved. Heck, most novice guitars probably are not tuned properly.

tommymachine · on Aug 24, 2018

Most guitars aren't in tune period [1].

1. https://www.youtube.com/watch?v=D8EjCTb88oA

montalbano · on Aug 24, 2018

> Music is not a language

I think almost anyone who's spent significant time jamming/improvising with other musicians will disagree.

The statement is an oversimplification. It is an oversimplification that may be necessary for music to become tractable to machine learning algorithms, but it does more to highlight the limitations of current understanding of AI than anything else.

Otherwise a very interesting read!

toolslive · on Aug 24, 2018

> Most people can’t tell you why they like the music that they like. Not with enough resolution to accurately predict their preference for new tracks

Won't most people will be able to tell you with good resolution why they don't like the music they don't like?

nerdponx · on Aug 24, 2018

Sometimes. They might be able to identify which aspect of the music they don't like, but I don't know if they are able to explain "why" they don't like it. But I feel the same is true on the positive side. Maybe it's worth trying to figure out what part of the music people like ("John Denver has a great voice"), rather than some nebulous reason why they like it.

okonomi · on Aug 24, 2018

Not machine learning but this is an extremely cool generative approach for emulating sounds on a synthesizer https://fo.am/midimutant/

BillBohan · on Aug 24, 2018

These are some interesting observations regarding music and machine learning. It has been my experience that the majority of the output of ML music generators falls into the category I would classify as noise.

I briefly experimented with procedural music generation many years ago and will relate my experience in the hope that some may find it interesting or take inspiration from it.

I had read the Byte magazine article called "A Travesty Generator for Micros." which works with text files and realized that Markov chains could be applied either to whole word or individual letters. Sufficiently long chains of letters almost always produce actual words. Sufficiently long chains of words generally produce complete (although nonsensical) sentences. Excessively long chains copy the input to the output. See [1] and [2]

At the time I was playing LOTRO [3][4] which uses ABC files [5] which are a text representation of music. I used the .abc files as input to the travesty program and got very interesting output. I used the rescan method which reads the input file for each note to output. It is slow but uses far less memory than the array method which reads the input once and generates a complete table of all transitions.

Running travesty on a single .abc file produces an output which is very similar to the input and only mildly interesting. Chaining together 2 or more input files is when it gets more interesting. It did not work well unless the input files had the same key signature.

I considered the possibility of transposing all input files to a common key signature but did not implement it. Nearly all music representation is an abstraction of the music. Music is generally quantized into notes of the even tempered 12 note scale. The tune is recognizable regardless of the instrument it is played on. I wondered whether there were further abstractions which could be used similar to the way that either letters or words could be used for text but am not sufficiently musical that I could discover them.

If you try this I think you will quickly get results which encourage you to continue.

[1] https://en.wikipedia.org/wiki/Parody_generator

[2] http://runme.org/project/+travesty/

[3] https://en.wikipedia.org/wiki/The_Lord_of_the_Rings_Online

[4] http://www.lotro.com/en?

[5] http://abcnotation.com/