To the uninitiated these may seem like reasonable results because to the uninitiated hip hop is just a bunch of unintelligible words strung together anyway but these lyrics don't make much sense. There's no cohesive stream of thought behind the lines. It's a semirandom juxtaposition of phrases and a great example of one of the biggest limitations of deep learning softwares. Even the best are only able to maintain very limited context. This is not just a matter of scaling up to larger networks. Current methods require exponentially larger networks to achieve linear increases in context handling. So then why can our brains do it? The answer is simply that our brains are more than neural networks.
Because current NNs only simulate, like, less than 1 mm^3 of brain matter. Someone writing lyrics for a song has millions of such tiny networks working concurrently in their brain - and then there are higher-level networks supervising and aggregating the smaller nets, and so on.
Current AI NN architectures are flat and have no high-level structure. There's no hierarchy. There's no plurisemantic context spanning large time intervals and logic trees. No workable memory organized short-, mid- and long-term. Etc, etc, etc.
Further, the brain uses subsystems whose architectures are good at the specific problem their solving with smooth integration that somehow often preserves context. The approach of current ANN's seems to be equivalent of taking visual cortext or other single subsystem then trying to apply it to every other subsystem's area of expertise. It will do interesting things but always fall short since it's architecturally not the right tool for the job.
It's why my money is on the humans for the upcoming Starcraft challenge. Like poker, bluffing is integral to the process. AI's have been having a really hard time with that in poker except for constrained games. Starcraft provides enough open-ended opportunities that the AI will get smashed by clever human. Hip hop similarly combines tons of references to pop culture, psychology of what goes with what, coordination of presentation, and established musical tricks. At least half of them ANN's suck at by design.
1. The game does not really depend on a larger context. What you see is what you get. The "muscle memory" of a relatively simple ANI could therefore be enough. This is partially in contradiction with what you said above about bluffing, but I feel the contradiction is less than 50%.
2. Simple "muscle memory" strategies should not be enough to win the game, but the ANI's lightning speed reactions and its ability to see the whole game at once are enough to outperform more sophisticated-thinking humans who are slower and have tunnel vision w.r.t. the game. Basically the brute-force approach.
I'm not placing bets, and I'm as curious as everyone else as to the result of the contest. I'm just saying - if the AI does win, these are the ways it could do that.
I'm using the expression "muscle memory", which is inadequate, because I have no better way to express how current NNs operate. They are dumb at the higher semantic levels. They only become powerful through colossal repetition and reinforcement.
Watching current NNs being trained never fails to give me flashbacks from my college days when I was practicing karate. We would go to the dojo, pick a technique, and then repeat it an enormous number of times, to let the sequence sink into muscle memory. I'm sure I still have some (natural) NNs in my brain that still have got that thing down pat - I don't have to think about doing the techniques, they just "execute" on their own. But there's no semantic level here, it's just dumb (but blazing fast) automation.
It's possible but it's a lot more context-driven than you think with a human involved. The bots do great against each other with their strategies and muscle-memory stuff. Throw a human in the mix, they start noticing patterns in how the units are managed, what strategies are likely, etc. They've exploited these in the competitions to hilarious effect. Here's the main site on prior work & results at the competitions:
Here's a few examples of how the muscle-memory approach, esp if focused on unit vs unit, can fail against humans.
" In this example, Bakuryu (human) notices that Skynet's units will chase his zerglings if they are near, and proceeds to run around Skynet's base to distract Skynet long enough so that Bakuryu can make fying units to come attack Skynet's base. This type of behaviour is incredibly hard to detect in a bot, since it requires knowledge of the larger context of the game which may only have consequences 5 or more minutes in the future. " (2013)
Note: At this point, they also sucked at building and expansion strategies which surprised me since I thought a basic planner would be able to do that. The constraints between rate of expansion, where to place stuff, what units to keep/build, and so on get really hard. The other thing they weren't good at was switching strategies mid-game based on what opponents are doing.
" despite Djem5 (pro) making the bots look silly this year... they were able to defeat D-ranked, and even some C-ranked players. After the human players have played one or two games against the bots they are then easily able to detect and exploit small mistakes that the bots make in order to easily win the majority of games... " (2015)
I don't have the 2016 result yet. It's clear they're getting better but there's a huge leap between bots and humans. Gap seems to be context, reading opponents, and expansion. Now, if they can fix those, they have a chance of taking down pro's combined with the machines inherent strength in micromanagement and muscle-memory on specific attack/defense patterns.
Examples below of the AI's managing units in perfect formation & reaction below. It's like an emergent ballet or something. The second one could be terrifying if they can get it into real-world military. Figured you might enjoy these given your position. :)
I'm pretty sure a NN AI could beat most players in Starcraft. Starcraft is actually pretty straight forward and the meta hasnt changed much for a while, which means the NN will have tons of training data. By seeing the revealed map and learning to send in a scout a few times, the AI could be frightening.
Harassing also is straight forward and high-level bluffs are relatively hard to pull off in Starcraft (you need to aim at mineral line for example), so out-of-ordinary experience are rare.
The training data would give them a considerable advantage against non-experts. However, the human pro's have managed to bluff them or exploit their patterns in every competition to date. You might be underestimating the risk of bluffs or just odd behavior to the future competition. I hope they figure out how to nail it down as it's critical for AI in general. They just haven't yet.
The other angle is that humans got this good with way less training data and personal exploration. Success with training on all available data would mean AI's could solve problems like this only with massive, accurate hindsight. Problems we do in the real-world often require foresight, too, either regularly or in high-impact, rare scenarios. We'd still be on top in terms of results vs training time even if humanity takes a loss in the competition. :)
I already said there was a huge set of unstructured and structured data fed into the brain over one to two decades before usefulness. The difference is brain architecture doesnt require an insane number of the exact thing you want it to do. It extrapolates with existing data with small training set. Further, it shows somd common sense and adaptation in how it handles weird stuff.
Try doing that with existing schemes. The data set within their constraints would dwarf what a brain takes with less results.
The comparison is contaminated by the test being preselected for a skill that humans are good at (i.e. the game has been designed to be within the human skill range of someone from a modern society).
I am sure you could design a game around the strengths of modern AI’s that no human could ever win. What would this tell us?
I have no idea. Humans are optimized to win at the real world against each other and all other species. That's a non-ideal environment, too. Designing an AI to win in a world or environment optimized for them might be interesting in some way. Just doubt it would matter for practical applications in a messy world inhabited by humans.
Author here. You would be surprised how much of the input lyrics were just semirandom juxtaposition of phrases. The best hip hop is not this, sure, but some of it certainly is.
Sure, and most human made music is crap. That's not a diss or putting down your work, which is laudable, but the distinction GP presented I believe stands.
It's very hard to produce _great_ content this way, simply because there are a lot more variables and dimensions to writing good music and lyrics.
I made a similar point in defense of Shakespeare[0], and one reply pointed out a kind of category error on my part:
> The samples from all of the examples are nonsense. What's interesting is that they, mostly, follow the form of the original.
These examples appear much less nonsensical than those, but that is (I'd hypothesize) because rap is so much more grammatically and rhetorically liberal. It does make me wonder how we would distinguish "true" semantics from (trained) formal imitations, when the latter are growing in sophistication.
Still, it's a danger in this kind of article that, to quote again from that thread,
> it's a usual tendency of NNs to produce output that looks meaningful to non-experts, yet is complete gibberish to experts.
> This is not just a matter of scaling up to larger networks.
> The answer is simply that our brains are more than neural networks.
At the risk of wasting time arguing against mysticism, there is no evidence for either of these statements. (Well, the latter is technically true, but not in the way I think you mean. There's no particular reason an NN couldn't do anything a brain does.) The only thing we can say with confidence is that the OP's model focuses more on rhyme than content, which is true for a lot of popular rappers as well.
>> There's no particular reason an NN couldn't do anything a brain does.
Brains (read: humans) can learn from very few examples and in very little time. Despite that, we learn a rich context that is flexible enough to constantly incorporate new knowledge and general enough to transfer learning across diverse domains.
Those are all things that ANNs have proven quite incapable of doing, as has any other technology you might want to think of.
You don't need to reach for a mystical explanation, either. Our technology is far, far less advanced than the current hype cycle will have you believe. Thinking that we can reproduce the function of the human brain with computers is what is the real mystical belief.
our brain is a 20W meat computer. i'd say that believing it can't be reproduced is quite mystical indeed. it's a matter of time; it'll be a 20MW factory-sized supercomputer at first, but it'll be done. not saying it'll happen in this decade, or the next, but this century must be it, assuming humanity makes it to 2100s.
I agree it can be reproduced over time. I encourage that they do everything they can to do that. It's worth a Manhattan project just because of all the side benefits it will probably lead to. We now have computation and storage for it, too. One ANN scheme even printed neurons in analog form on an entire wafer then packed the whole thing without cutting it. Because brain-like architectures let you do such things. :)
Now for the problem: that's not what most of them are doing. Instead, they're intentionally avoiding how the brain does reasoning and asynchronous/analog implementation to devise weaker techniques built on synchronous, digital implementations in tinier spaces. They try to make up for this weakness by throwing massive amounts of computation at it but it's already clear the algorithms themselves are what has to change. Ideally, we'd start experimenting with every version of the brains own algorithms and structures for specific types of activities in brain structures we're pretty sure perform such activities. We might accidentally discover the right stuff for certain problems. Tie them together over time.
That's not what they're doing though. So, they will have to independently invent an entirely new scheme that matches the brain's capabilities with techniques mostly opposite of what it relied on for those capabilities. Looks like a loosing proposition to me. They might achieve it but I'd rather the money clone the brain or it's architectural style.
The fact that the brain uses very little power and yet manages to solve really hard problems means that whatever it's doing is very efficient. The fact that ANNs need terrabytes of data and petaflops of processing power and still can only show rudimentary aptitude in mechanical tasks means they're not very efficient at all. Not that anyone ever called ANNs "efficient" (I'm not talking about backprop- but about iterating through covariance matrices). But if they were as efficient as the brain, they'd now be way, way smarter than us.
We know from undergraduate Comp Sci that there are problems that can simply not be solved, except with efficient algorithms. The fact that the brain is doing something terribly efficient is a big hint that whatever it's doing requires it to be that (because evolution hand waving hand waving). ANNs are nothing like that - they're practically brute force.
So how then can anyone expect that we're going to solve the hard problems the brain can, with ANNs?
This is just an outdated view of the state-of-the-art. It's understandable, given that it's outdated by maybe six months, iff you're willing to go with preprints.
A hobbyist looking for something plug-and-play will still generally want lots of data; the cutting edge is not exactly "curl|bash"-able. But the papers coming out this year have been dispatching what I thought would be entire areas of study in a dozen pages, one after another after another.
Not only do I think it's a "when" and not an "if", I think the timelines people throw around date to "ancient" times - meaning, a few years ago. Given where were are right now, what we should be asking is whether "decades" should be plural.
I don't see how your endearing enthusiasm is supported by the paper you reference.
It's a paper, so I won't be doing it justice by tl;dr'ing it in three sentences but, in short:
a) One-shot/ meta learning is not a new thing; the paper references work by Seb. Thrun from 1998 [1]. Hardly a six-month old revolution that's taking the world by storm.
b) There are serious signs that they are overfitting like crazy, and
c) their approach requires few examples but they must be presented hundreds of thousands of times before performance improves. That's still nowhere near the speed or flexibility of human learning.
Also, did you notice they had to come up with a separate encoding scheme, because "learning the weights of a classifier using large one-hot vectors becomes increasingly difficult with scale" [2]? I note that this is a DeepMind paper. If something doesn't scale for them you can betcha it doesn't scale, period.
So, not seeing how this is heralding the one-shot-learning/ meta-learning revolution that I think you're saying it does.
___________
[1] Their reference is: Thrun, Sebastian. Lifelong learning algorithms. In Learning to learn , pp. 181–209. Springer, 1998.
[2] Things are bad enough that they employ this novel encoding even though it does not ensure that a class will not be shared across different episodes, which will have caused some "interference". This is a really bad sign.
"This is just an outdated view of the state-of-the-art. It's understandable, given that it's outdated by maybe six months, iff you're willing to go with preprints."
That's an understandable, but probably incorrect, view that comes from focusing on claims in state-of-the-art publications too much without the wider context of history & brain function. The problem parent is referring to also includes the general, "common sense" that we build up over time with extreme diversity of experiences that is developed despite tons of curveballs & able to create them ourselves. New knowledge is incorporated into that framework pretty smoothly. An early attempt to duplicate that was Cyc project's database of common sense. There's maybe just five or six total per Minsky with most AI researchers not thinking it's important. Those last words told me to be pessimistic already.
Whereas, the only computer capable of doing what they are trying to do uses a diverse set of subsystems specialized to do their jobs well. A significant amount of it seems dedicated to establishing common sense tying all experiences together. The architecture is capable of long-term planning, reacting to stuff, and even doing nothing when that makes sense. It teaches itself these things based on sensory input. It does it all in real-time with what appears to be a mix of analog and digital-like circuits in tiny amount of space and energy. And despite this, it still takes over a decade of diverse, training data to become effective enough to do stuff like design & publish ANN schemes. :)
There's hardly anything like the brain being done in ANN research that I've seen. The cutting-edge stuff that's made HN is pale imitation with small subset of capabilities trying to make one thing do it all. The pre-print you posted is also rudimentary compared to what I described above. Interestingly, the brain also makes use of feedback designs where most I see shared here (like in the late 90's) was feed-forward as if trying to avoid exploring the most effective technique that already solved the problem. Like the linked paper did.
They just all seem to be going in fundamentally wrong directions. Such directions will lead to nice, local maxima but miss the global maxima by a long shot. Might as well backtrack while they're ahead if they want the real thing.
Sorry - should have quoted the specific thing I was calling out-of-date, which was from a few comments up-thread:
> Brains (read: humans) can learn from very few examples and in very little time. [...] Those are all things that ANNs have proven quite incapable of doing, as has any other technology you might want to think of.
Note that I said "with reasonable efficiency", not "with some huge number of inputs".
That's because we don't need to represent any function; we need to represent the class of functions that can be efficiently represented in a human brain as well, which is pretty much the same. Note that we can also implement any boolean component with only a few neurons in an NN, and using an RNN gives us working memory as well, so we can implement any sort of digital processor with reasonable efficiency in an RNN (where "reasonable efficiency" means "a linear multiple of the number of components in the original circuit").
>> That's because we don't need to represent any function; we need to represent
the class of functions that can be efficiently represented in a human brain as
well, which is pretty much the same.
The problem is that to learn a function from examples you need the right kind of
examples, and for human cognitive faculties it's very hard to get that.
For instance, take text- text is a staple in machine-learning models of
language... but it is not language. It's a bunch of symbols that are only
intelligible in the context of an already existing language faculty. In other
words, text means nothing unless you already understand language, which is why
although we can learn pretty good models of text, we haven't made much progress
in learning models of language. Computers can generate or recognise language
pretty damn well- but when it comes to understanding it... Well, we haven't even
convincingly defined that task, let alone being able to train anything, RNN or
whatever, to perform it.
You can see similar issues with speech or image processing, where btw RNNs have
performed much better than with language.
So, just because RNNs can learn functions in principle it doesn't mean that we
can really reproduce human behaviour in practice.
Brains take 2+ years of constant training before they start to do much of anything we would associate with strong AI, and another 10 or so years of constant training before they can do anything worth spending money on. I'm not sure how you call that "very few examples". Brains do have some high-level inference facilities that work on smaller data sets, but the support hardware for that appears to be genetically coded to a large degree, and we can make computers do a lot of that sort of stuff too. No reason we couldn't make a big NN do the same.
> Thinking that we can reproduce the function of the human brain with computers is what is the real mystical belief.
No, not really. Most physicists believe that physics is either computable or approximable to below the noise floor. Thinking otherwise requires some sort of mystical religious belief about non-physical behavior.
>> Brains take 2+ years of constant training before they start to do much of anything we would associate with strong AI, and another 10 or so years of constant training before they can do anything worth spending money on. I'm not sure how you call that "very few examples".
You're talking about human brains. The brains of, say, gazelles, are ready for surviving in an extremely hostile environment a few minutes after they are born. See for example [1]. Obviously they can't speak or do arithmetic, but they can navigate their surroundings with great competence, find sustenance (even just their mothers' teat) and avoid danger.
That's already far, far beyond the capabilities of current AI and if I could make a system even half that smart I'd be the most famous woman on the planet. Honestly. And also, the richest. And most poweful. Screw Elon Musk and his self-driving cars- I'd rule the world with my giant killer robots of doom :|
Also- "very few examples": that's the whole "poverty of the stimulus" argument. In short, babies learn to speak without ever hearing what we would consider enough language. Noam Chomsky used that to argue for an innate "universal grammar" but there must be at least some learning performed by babies before they learn to speak their native language, and they manage it after hearing only very, very little of it.
Are you saying that brains will eventually be possible to copy with computers? In a thousand years, with completely different computers, maybe. Why not. But with current tech, forget about it.
General consensus is that this is hard-wired genetic behavior. It's mildly impressive, but nothing that we think we couldn't do on a computer with enough time and effort.
> In short, babies learn to speak without ever hearing what we would consider enough language.
All known humans who were deprived of social contact during early development were unable to learn speech later on. Babies get a ton of language stimulus; I'm not sure where you're getting "what we would consider enough".
> In a thousand years, with completely different computers, maybe.
We're only a few orders of magnitude off from standard COTS computer equipment being able to match the throughput you would expect from a human brain doing one "useful" thing per neuron at several kHz (which is probably a gross overestimation). Even if we decided to do a full neurophysiological simulation for every neuron in the brain, that only adds a few more orders of magnitude required compute power.
We expect to hit $1/(TFLOP/s) over the next 20 years or so, and there's physically no way the brain is doing more than a (PFLOP/s), unless neurons are doing some insane amount of work at a sub-neuronal level (which, I admit, is possible, but quite unlikely).
I would propose a long-term bet, but I'm not sure what the conditions would be.
I haven't seen evidence of that. They're given a few things to start with. Then they seem to apply a hyper-effective scheme for learning on raw data that combines personal exploration (unsupervised) and societal guidance (supervised). It then takes these brains nearly two decades of training data & experiences to become effective in the real-world. Virtually everything people say about what ANN's might accomplish leaves off that last part that was critical to the superior architecture they're trying to compete with.
Not sure what you mean, imo the D-Prime lyrics were pretty good, sometimes very clever due to unusual word choice. It sounds a lot like Illmatic era Nas.
On a technical level that's not true. Illmatic was full of 3 and 4 syllable rhymes, while the D-Prime example rhymes are almost all 1 syllable.
Also Nas is very smooth and fluent. The lyrical themes may be closer to Nas, but in terms of flow D-Prime is more like early hip hop - Sugar Hill Gang, Kurtis Blow, Grandmaster Flash etc.
Our brains may be more than the current iteration of extremely simple, layer-by-layer neural networks, but there's no reason to think they're something fundamentally different than the NN principle of parallel distributed processing at many nodes.
Author here. I'm sad that you think that. I love hip hop, and I know how far away this model is from the best MCs. The model has no idea what words actually mean, it's just spitting out a grammar it has learned. We are nowhere near the point where an AI can formulate an original thought and articulate it in rhyme, and I even called my model a 'parlour trick' in the post. It was just supposed to be fun.
OP and others may be interested in this approach by Sony CSL from ~2012 using constraint Markov chains for matching lyrical style with explicit rhyme constraints, including (e.g.) Bob Dylan style with constraint of Beatles' Yesterday [0].
This is the precursor to the work that brought on "Daddy's Car"[1], though the techniques appear to have changed a bit reading their recent publications. I find their paper on recent approaches for music synthesis with style matching or other constraints [2] quite readable, and really good! I am hoping to build out a version of what is described by the paper. There is also a very nice talk by F. Pachet on this subject here [3], and his older videos are good too (they should be in the recommendations of YouTube).
Still reading about DeepRhyme, but it seems great. Also a fantastic writeup!
Really drives home both how much more sophisticated hip hop has become and the incredible degree of difficulty in trying to simulate such sophisticated language structures.
Eminem has similar trickery going on in most songs and that's probably one of the reasons of his success. Of course many other rappers do the same, and surely Eminem isn't the inventor of funky rhyme patterns.
It's been 15 years but I used to analyze the patterns. Brings back memories.
A rap lyrics generation algorithm with deep learning was also recently published [1] in KDD conference and it also got quite a lot of publicity in other media: see the associated website at http://deepbeat.org.
Author of DeepRhyme here. With respect to the authors of DeepBeat, what they are doing is less ambitious. They are taking full existing lines out of a rap lyric corpus and assembling it into a verse that rhymes and makes sense. There is a paper that is very similar to what I did : http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP221.pdf . It's hard to compare our models, because they don't give much output text. They trained on 1% of the data I did, so I'm a bit dubious how successful they could have been.
I am writing a followup post where I'm going to talk about previous work, I hope no one takes any disrespect.
Instead of completely banning actual lines from other material, it might be interesting to allow D-Prime to quote or slightly modify a phrase if it met some high threshold of notability.
I have been working on exactly this for the past few weeks. I went ahead and produced a song and a lyric video from mine (I call it Lil' Markov). It started out because I noticed patterns in Lil Wayne song where he'd say some idiom and then rhyme it. It seemed so predictable I figured I could make a bot that automated Lil Wayne's process...
From what I can tell our methods are very similar. I have a thing that allows you to input a concept and it will try to rap around that concept and a `whiteGuilt()` function (that's whats up with all the ninjas), but other than that our process is 90% similar.
As a huge hip-hop head, I will just add that there are two aspects to evaluating rhymes: How it sounds when delivered by the artist, and how it reads. The best rap succeeds in both domains. Bob Dylan won the Nobel Prize because he succeeds in both domains. In this case, we obviously can only consider the second domain, how it reads. So we should evaluate the results exactly how we would evaluate any other poem (while giving due respect to the conventions and tropes of the form, ie, rap has its own classic metaphors and themes).
I think you're on to something here! It might be a while before we train a NN that can out-rap the pros. But can we raise the bar for applying the style to alternative content domains to generate educational raps that aren't lame?
I wonder if approaches like this will work for poetry (or prose) from different cultures. I'd be especially interested to see if some of these techniques are effective on languages markedly different from English. (Maybe Chinese and Arabic?)
This is great. Well done to OP! I've been considering creating something like this in the future as well.
You mentioned that you used the lyrics from 50,000 rap songs. Did you have any filters for quality or certain sub-genres? What era were the rap songs from? Maybe the 'quality' would be higher if the data set was narrowed to what you consider quality hip hop lyrics.
Also from your post: "You can see that D-Prime does internally rhyme to a small extent, but making the model better in this regard seemed like it was going to be tough, and I didn’t pursue it."
What were your main challenges to improving/pursing better internal rhyme schemes?
Hey man, thanks! I didn't try to filter to any subgenres, but I did try to give the model as easy a time as possible by trying to stick verses which stuck to the most common words (lowering the perplexity of the corpus). I would call this genre 'mediocre', and may explain why I had such a hard time with the sexism thing.
Internal rhyming / assonance and other stuff could be done by rewarding the model for using it during the beam search. I suspect it's going to be hard to tune how to set the reward so it still makes sense, it only just makes sense already. I also think at this point you would really need to sit down and figure out what rhymes and what doesn't. I did the quickest, hackiest thing I could think of, which was good enough for generating end rhymes, but if you want to have internal rhymes and not massively constrain the model, I think you need to work it out purely from analysing the syllables.
I have to giggle a little bit at the implications of taking source material that's just about every negative *ist word it's possible to be and then using ML to generate a version without that content.
It would be fascinating to try training it on some HRC and Trump speeches. I am doubtful most people would be able to tell the difference between generated remarks and the real thing.
I'm sensing an impending gangsta job shortage. I wonder if machines become better at being outlawz, then might these individuals pursue some form of higher street education to keep up?