Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: DeepRhyme (D-Prime) – Generating dope rhymes with machine learning (swarbrickjones.wordpress.com)
208 points by sweezyjeezy on Nov 7, 2016 | hide | past | favorite | 80 comments



To the uninitiated these may seem like reasonable results because to the uninitiated hip hop is just a bunch of unintelligible words strung together anyway but these lyrics don't make much sense. There's no cohesive stream of thought behind the lines. It's a semirandom juxtaposition of phrases and a great example of one of the biggest limitations of deep learning softwares. Even the best are only able to maintain very limited context. This is not just a matter of scaling up to larger networks. Current methods require exponentially larger networks to achieve linear increases in context handling. So then why can our brains do it? The answer is simply that our brains are more than neural networks.


> So then why can our brains do it?

Because current NNs only simulate, like, less than 1 mm^3 of brain matter. Someone writing lyrics for a song has millions of such tiny networks working concurrently in their brain - and then there are higher-level networks supervising and aggregating the smaller nets, and so on.

Current AI NN architectures are flat and have no high-level structure. There's no hierarchy. There's no plurisemantic context spanning large time intervals and logic trees. No workable memory organized short-, mid- and long-term. Etc, etc, etc.

We're not even scratching the surface yet.


Further, the brain uses subsystems whose architectures are good at the specific problem their solving with smooth integration that somehow often preserves context. The approach of current ANN's seems to be equivalent of taking visual cortext or other single subsystem then trying to apply it to every other subsystem's area of expertise. It will do interesting things but always fall short since it's architecturally not the right tool for the job.

It's why my money is on the humans for the upcoming Starcraft challenge. Like poker, bluffing is integral to the process. AI's have been having a really hard time with that in poker except for constrained games. Starcraft provides enough open-ended opportunities that the AI will get smashed by clever human. Hip hop similarly combines tons of references to pop culture, psychology of what goes with what, coordination of presentation, and established musical tricks. At least half of them ANN's suck at by design.


There are two ways the AI could win at Starcraft:

1. The game does not really depend on a larger context. What you see is what you get. The "muscle memory" of a relatively simple ANI could therefore be enough. This is partially in contradiction with what you said above about bluffing, but I feel the contradiction is less than 50%.

2. Simple "muscle memory" strategies should not be enough to win the game, but the ANI's lightning speed reactions and its ability to see the whole game at once are enough to outperform more sophisticated-thinking humans who are slower and have tunnel vision w.r.t. the game. Basically the brute-force approach.

I'm not placing bets, and I'm as curious as everyone else as to the result of the contest. I'm just saying - if the AI does win, these are the ways it could do that.

I'm using the expression "muscle memory", which is inadequate, because I have no better way to express how current NNs operate. They are dumb at the higher semantic levels. They only become powerful through colossal repetition and reinforcement.

Watching current NNs being trained never fails to give me flashbacks from my college days when I was practicing karate. We would go to the dojo, pick a technique, and then repeat it an enormous number of times, to let the sequence sink into muscle memory. I'm sure I still have some (natural) NNs in my brain that still have got that thing down pat - I don't have to think about doing the techniques, they just "execute" on their own. But there's no semantic level here, it's just dumb (but blazing fast) automation.


It's possible but it's a lot more context-driven than you think with a human involved. The bots do great against each other with their strategies and muscle-memory stuff. Throw a human in the mix, they start noticing patterns in how the units are managed, what strategies are likely, etc. They've exploited these in the competitions to hilarious effect. Here's the main site on prior work & results at the competitions:

https://www.cs.mun.ca/~dchurchill/starcraftaicomp/reports.sh...

Here's a few examples of how the muscle-memory approach, esp if focused on unit vs unit, can fail against humans.

" In this example, Bakuryu (human) notices that Skynet's units will chase his zerglings if they are near, and proceeds to run around Skynet's base to distract Skynet long enough so that Bakuryu can make fying units to come attack Skynet's base. This type of behaviour is incredibly hard to detect in a bot, since it requires knowledge of the larger context of the game which may only have consequences 5 or more minutes in the future. " (2013)

Note: At this point, they also sucked at building and expansion strategies which surprised me since I thought a basic planner would be able to do that. The constraints between rate of expansion, where to place stuff, what units to keep/build, and so on get really hard. The other thing they weren't good at was switching strategies mid-game based on what opponents are doing.

" despite Djem5 (pro) making the bots look silly this year... they were able to defeat D-ranked, and even some C-ranked players. After the human players have played one or two games against the bots they are then easily able to detect and exploit small mistakes that the bots make in order to easily win the majority of games... " (2015)

I don't have the 2016 result yet. It's clear they're getting better but there's a huge leap between bots and humans. Gap seems to be context, reading opponents, and expansion. Now, if they can fix those, they have a chance of taking down pro's combined with the machines inherent strength in micromanagement and muscle-memory on specific attack/defense patterns.

Examples below of the AI's managing units in perfect formation & reaction below. It's like an emergent ballet or something. The second one could be terrifying if they can get it into real-world military. Figured you might enjoy these given your position. :)

https://www.youtube.com/watch?v=DXUOWXidcY0

https://www.youtube.com/watch?v=IKVFZ28ybQs


I'm pretty sure a NN AI could beat most players in Starcraft. Starcraft is actually pretty straight forward and the meta hasnt changed much for a while, which means the NN will have tons of training data. By seeing the revealed map and learning to send in a scout a few times, the AI could be frightening.

Harassing also is straight forward and high-level bluffs are relatively hard to pull off in Starcraft (you need to aim at mineral line for example), so out-of-ordinary experience are rare.


The training data would give them a considerable advantage against non-experts. However, the human pro's have managed to bluff them or exploit their patterns in every competition to date. You might be underestimating the risk of bluffs or just odd behavior to the future competition. I hope they figure out how to nail it down as it's critical for AI in general. They just haven't yet.

The other angle is that humans got this good with way less training data and personal exploration. Success with training on all available data would mean AI's could solve problems like this only with massive, accurate hindsight. Problems we do in the real-world often require foresight, too, either regularly or in high-impact, rare scenarios. We'd still be on top in terms of results vs training time even if humanity takes a loss in the competition. :)


>The other angle is that humans got this good with way less training data and personal exploration.

I think you are underestimating how much training goes into a human. I would not mind seeing how a new born baby does against one of these AIs.


I already said there was a huge set of unstructured and structured data fed into the brain over one to two decades before usefulness. The difference is brain architecture doesnt require an insane number of the exact thing you want it to do. It extrapolates with existing data with small training set. Further, it shows somd common sense and adaptation in how it handles weird stuff.

Try doing that with existing schemes. The data set within their constraints would dwarf what a brain takes with less results.


The comparison is contaminated by the test being preselected for a skill that humans are good at (i.e. the game has been designed to be within the human skill range of someone from a modern society).

I am sure you could design a game around the strengths of modern AI’s that no human could ever win. What would this tell us?


I have no idea. Humans are optimized to win at the real world against each other and all other species. That's a non-ideal environment, too. Designing an AI to win in a world or environment optimized for them might be interesting in some way. Just doubt it would matter for practical applications in a messy world inhabited by humans.


Author here. You would be surprised how much of the input lyrics were just semirandom juxtaposition of phrases. The best hip hop is not this, sure, but some of it certainly is.


Sure, and most human made music is crap. That's not a diss or putting down your work, which is laudable, but the distinction GP presented I believe stands.

It's very hard to produce _great_ content this way, simply because there are a lot more variables and dimensions to writing good music and lyrics.


I made a similar point in defense of Shakespeare[0], and one reply pointed out a kind of category error on my part:

> The samples from all of the examples are nonsense. What's interesting is that they, mostly, follow the form of the original.

These examples appear much less nonsensical than those, but that is (I'd hypothesize) because rap is so much more grammatically and rhetorically liberal. It does make me wonder how we would distinguish "true" semantics from (trained) formal imitations, when the latter are growing in sophistication.

Still, it's a danger in this kind of article that, to quote again from that thread,

> it's a usual tendency of NNs to produce output that looks meaningful to non-experts, yet is complete gibberish to experts.

[0] https://news.ycombinator.com/item?id=12338081


Nah, they're less non-sensical because of the beam search sampling approach I took. If you don't do this, they're just as bad as that.


> This is not just a matter of scaling up to larger networks.

> The answer is simply that our brains are more than neural networks.

At the risk of wasting time arguing against mysticism, there is no evidence for either of these statements. (Well, the latter is technically true, but not in the way I think you mean. There's no particular reason an NN couldn't do anything a brain does.) The only thing we can say with confidence is that the OP's model focuses more on rhyme than content, which is true for a lot of popular rappers as well.


>> There's no particular reason an NN couldn't do anything a brain does.

Brains (read: humans) can learn from very few examples and in very little time. Despite that, we learn a rich context that is flexible enough to constantly incorporate new knowledge and general enough to transfer learning across diverse domains.

Those are all things that ANNs have proven quite incapable of doing, as has any other technology you might want to think of.

You don't need to reach for a mystical explanation, either. Our technology is far, far less advanced than the current hype cycle will have you believe. Thinking that we can reproduce the function of the human brain with computers is what is the real mystical belief.


our brain is a 20W meat computer. i'd say that believing it can't be reproduced is quite mystical indeed. it's a matter of time; it'll be a 20MW factory-sized supercomputer at first, but it'll be done. not saying it'll happen in this decade, or the next, but this century must be it, assuming humanity makes it to 2100s.


I agree it can be reproduced over time. I encourage that they do everything they can to do that. It's worth a Manhattan project just because of all the side benefits it will probably lead to. We now have computation and storage for it, too. One ANN scheme even printed neurons in analog form on an entire wafer then packed the whole thing without cutting it. Because brain-like architectures let you do such things. :)

Now for the problem: that's not what most of them are doing. Instead, they're intentionally avoiding how the brain does reasoning and asynchronous/analog implementation to devise weaker techniques built on synchronous, digital implementations in tinier spaces. They try to make up for this weakness by throwing massive amounts of computation at it but it's already clear the algorithms themselves are what has to change. Ideally, we'd start experimenting with every version of the brains own algorithms and structures for specific types of activities in brain structures we're pretty sure perform such activities. We might accidentally discover the right stuff for certain problems. Tie them together over time.

That's not what they're doing though. So, they will have to independently invent an entirely new scheme that matches the brain's capabilities with techniques mostly opposite of what it relied on for those capabilities. Looks like a loosing proposition to me. They might achieve it but I'd rather the money clone the brain or it's architectural style.


So, I don't get this.

The fact that the brain uses very little power and yet manages to solve really hard problems means that whatever it's doing is very efficient. The fact that ANNs need terrabytes of data and petaflops of processing power and still can only show rudimentary aptitude in mechanical tasks means they're not very efficient at all. Not that anyone ever called ANNs "efficient" (I'm not talking about backprop- but about iterating through covariance matrices). But if they were as efficient as the brain, they'd now be way, way smarter than us.

We know from undergraduate Comp Sci that there are problems that can simply not be solved, except with efficient algorithms. The fact that the brain is doing something terribly efficient is a big hint that whatever it's doing requires it to be that (because evolution hand waving hand waving). ANNs are nothing like that - they're practically brute force.

So how then can anyone expect that we're going to solve the hard problems the brain can, with ANNs?


This is just an outdated view of the state-of-the-art. It's understandable, given that it's outdated by maybe six months, iff you're willing to go with preprints.

https://arxiv.org/abs/1605.06065

A hobbyist looking for something plug-and-play will still generally want lots of data; the cutting edge is not exactly "curl|bash"-able. But the papers coming out this year have been dispatching what I thought would be entire areas of study in a dozen pages, one after another after another.

Not only do I think it's a "when" and not an "if", I think the timelines people throw around date to "ancient" times - meaning, a few years ago. Given where were are right now, what we should be asking is whether "decades" should be plural.


I don't see how your endearing enthusiasm is supported by the paper you reference.

It's a paper, so I won't be doing it justice by tl;dr'ing it in three sentences but, in short:

a) One-shot/ meta learning is not a new thing; the paper references work by Seb. Thrun from 1998 [1]. Hardly a six-month old revolution that's taking the world by storm.

b) There are serious signs that they are overfitting like crazy, and

c) their approach requires few examples but they must be presented hundreds of thousands of times before performance improves. That's still nowhere near the speed or flexibility of human learning.

Also, did you notice they had to come up with a separate encoding scheme, because "learning the weights of a classifier using large one-hot vectors becomes increasingly difficult with scale" [2]? I note that this is a DeepMind paper. If something doesn't scale for them you can betcha it doesn't scale, period.

So, not seeing how this is heralding the one-shot-learning/ meta-learning revolution that I think you're saying it does.

___________

[1] Their reference is: Thrun, Sebastian. Lifelong learning algorithms. In Learning to learn , pp. 181–209. Springer, 1998.

[2] Things are bad enough that they employ this novel encoding even though it does not ensure that a class will not be shared across different episodes, which will have caused some "interference". This is a really bad sign.


"This is just an outdated view of the state-of-the-art. It's understandable, given that it's outdated by maybe six months, iff you're willing to go with preprints."

That's an understandable, but probably incorrect, view that comes from focusing on claims in state-of-the-art publications too much without the wider context of history & brain function. The problem parent is referring to also includes the general, "common sense" that we build up over time with extreme diversity of experiences that is developed despite tons of curveballs & able to create them ourselves. New knowledge is incorporated into that framework pretty smoothly. An early attempt to duplicate that was Cyc project's database of common sense. There's maybe just five or six total per Minsky with most AI researchers not thinking it's important. Those last words told me to be pessimistic already.

Whereas, the only computer capable of doing what they are trying to do uses a diverse set of subsystems specialized to do their jobs well. A significant amount of it seems dedicated to establishing common sense tying all experiences together. The architecture is capable of long-term planning, reacting to stuff, and even doing nothing when that makes sense. It teaches itself these things based on sensory input. It does it all in real-time with what appears to be a mix of analog and digital-like circuits in tiny amount of space and energy. And despite this, it still takes over a decade of diverse, training data to become effective enough to do stuff like design & publish ANN schemes. :)

There's hardly anything like the brain being done in ANN research that I've seen. The cutting-edge stuff that's made HN is pale imitation with small subset of capabilities trying to make one thing do it all. The pre-print you posted is also rudimentary compared to what I described above. Interestingly, the brain also makes use of feedback designs where most I see shared here (like in the late 90's) was feed-forward as if trying to avoid exploring the most effective technique that already solved the problem. Like the linked paper did.

They just all seem to be going in fundamentally wrong directions. Such directions will lead to nice, local maxima but miss the global maxima by a long shot. Might as well backtrack while they're ahead if they want the real thing.


Sorry - should have quoted the specific thing I was calling out-of-date, which was from a few comments up-thread:

> Brains (read: humans) can learn from very few examples and in very little time. [...] Those are all things that ANNs have proven quite incapable of doing, as has any other technology you might want to think of.


Fair enough. That's true where ANN's are getting better on that.


> So how then can anyone expect that we're going to solve the hard problems the brain can, with ANNs?

Because RNNs can approximate any self-referential circuit with reasonable efficiency. Just like the brain does with neurons.


Sure. And all multi-layer networks with three or more layers can approximate any function with arbitrary accuracy... given sufficient many inputs.

Where "sufficient many" translates as "for real problems, just too many".


Note that I said "with reasonable efficiency", not "with some huge number of inputs".

That's because we don't need to represent any function; we need to represent the class of functions that can be efficiently represented in a human brain as well, which is pretty much the same. Note that we can also implement any boolean component with only a few neurons in an NN, and using an RNN gives us working memory as well, so we can implement any sort of digital processor with reasonable efficiency in an RNN (where "reasonable efficiency" means "a linear multiple of the number of components in the original circuit").


>> That's because we don't need to represent any function; we need to represent the class of functions that can be efficiently represented in a human brain as well, which is pretty much the same.

The problem is that to learn a function from examples you need the right kind of examples, and for human cognitive faculties it's very hard to get that.

For instance, take text- text is a staple in machine-learning models of language... but it is not language. It's a bunch of symbols that are only intelligible in the context of an already existing language faculty. In other words, text means nothing unless you already understand language, which is why although we can learn pretty good models of text, we haven't made much progress in learning models of language. Computers can generate or recognise language pretty damn well- but when it comes to understanding it... Well, we haven't even convincingly defined that task, let alone being able to train anything, RNN or whatever, to perform it.

You can see similar issues with speech or image processing, where btw RNNs have performed much better than with language.

So, just because RNNs can learn functions in principle it doesn't mean that we can really reproduce human behaviour in practice.


Did you forget that the brain has evolved over billions of years to reach the capability it has?


Brains take 2+ years of constant training before they start to do much of anything we would associate with strong AI, and another 10 or so years of constant training before they can do anything worth spending money on. I'm not sure how you call that "very few examples". Brains do have some high-level inference facilities that work on smaller data sets, but the support hardware for that appears to be genetically coded to a large degree, and we can make computers do a lot of that sort of stuff too. No reason we couldn't make a big NN do the same.

> Thinking that we can reproduce the function of the human brain with computers is what is the real mystical belief.

No, not really. Most physicists believe that physics is either computable or approximable to below the noise floor. Thinking otherwise requires some sort of mystical religious belief about non-physical behavior.


>> Brains take 2+ years of constant training before they start to do much of anything we would associate with strong AI, and another 10 or so years of constant training before they can do anything worth spending money on. I'm not sure how you call that "very few examples".

You're talking about human brains. The brains of, say, gazelles, are ready for surviving in an extremely hostile environment a few minutes after they are born. See for example [1]. Obviously they can't speak or do arithmetic, but they can navigate their surroundings with great competence, find sustenance (even just their mothers' teat) and avoid danger.

That's already far, far beyond the capabilities of current AI and if I could make a system even half that smart I'd be the most famous woman on the planet. Honestly. And also, the richest. And most poweful. Screw Elon Musk and his self-driving cars- I'd rule the world with my giant killer robots of doom :|

Also- "very few examples": that's the whole "poverty of the stimulus" argument. In short, babies learn to speak without ever hearing what we would consider enough language. Noam Chomsky used that to argue for an innate "universal grammar" but there must be at least some learning performed by babies before they learn to speak their native language, and they manage it after hearing only very, very little of it.

Are you saying that brains will eventually be possible to copy with computers? In a thousand years, with completely different computers, maybe. Why not. But with current tech, forget about it.

_________

[1] https://www.youtube.com/watch?v=ANGD5cE2WoQ


> The brains of, say, gazelles,

General consensus is that this is hard-wired genetic behavior. It's mildly impressive, but nothing that we think we couldn't do on a computer with enough time and effort.

> In short, babies learn to speak without ever hearing what we would consider enough language.

All known humans who were deprived of social contact during early development were unable to learn speech later on. Babies get a ton of language stimulus; I'm not sure where you're getting "what we would consider enough".

> In a thousand years, with completely different computers, maybe.

We're only a few orders of magnitude off from standard COTS computer equipment being able to match the throughput you would expect from a human brain doing one "useful" thing per neuron at several kHz (which is probably a gross overestimation). Even if we decided to do a full neurophysiological simulation for every neuron in the brain, that only adds a few more orders of magnitude required compute power.

We expect to hit $1/(TFLOP/s) over the next 20 years or so, and there's physically no way the brain is doing more than a (PFLOP/s), unless neurons are doing some insane amount of work at a sub-neuronal level (which, I admit, is possible, but quite unlikely).

I would propose a long-term bet, but I'm not sure what the conditions would be.


> Brains (read: humans) can learn from very few examples and in very little time.

Brains leave the womb pre-trained.


I haven't seen evidence of that. They're given a few things to start with. Then they seem to apply a hyper-effective scheme for learning on raw data that combines personal exploration (unsupervised) and societal guidance (supervised). It then takes these brains nearly two decades of training data & experiences to become effective in the real-world. Virtually everything people say about what ANN's might accomplish leaves off that last part that was critical to the superior architecture they're trying to compete with.


> They're given a few things to start with.

That's what I just said.


Not sure what you mean, imo the D-Prime lyrics were pretty good, sometimes very clever due to unusual word choice. It sounds a lot like Illmatic era Nas.


> It sounds a lot like Illmatic era Nas.

On a technical level that's not true. Illmatic was full of 3 and 4 syllable rhymes, while the D-Prime example rhymes are almost all 1 syllable.

Also Nas is very smooth and fluent. The lyrical themes may be closer to Nas, but in terms of flow D-Prime is more like early hip hop - Sugar Hill Gang, Kurtis Blow, Grandmaster Flash etc.


Our brains may be more than the current iteration of extremely simple, layer-by-layer neural networks, but there's no reason to think they're something fundamentally different than the NN principle of parallel distributed processing at many nodes.


Rap music says something. It's hard not to read this project as just a further disenfranchisement its authors.


Author here. I'm sad that you think that. I love hip hop, and I know how far away this model is from the best MCs. The model has no idea what words actually mean, it's just spitting out a grammar it has learned. We are nowhere near the point where an AI can formulate an original thought and articulate it in rhyme, and I even called my model a 'parlour trick' in the post. It was just supposed to be fun.


OP and others may be interested in this approach by Sony CSL from ~2012 using constraint Markov chains for matching lyrical style with explicit rhyme constraints, including (e.g.) Bob Dylan style with constraint of Beatles' Yesterday [0].

This is the precursor to the work that brought on "Daddy's Car"[1], though the techniques appear to have changed a bit reading their recent publications. I find their paper on recent approaches for music synthesis with style matching or other constraints [2] quite readable, and really good! I am hoping to build out a version of what is described by the paper. There is also a very nice talk by F. Pachet on this subject here [3], and his older videos are good too (they should be in the recommendations of YouTube).

Still reading about DeepRhyme, but it seems great. Also a fantastic writeup!

[0] https://www.csl.sony.fr/downloads/papers/2012/barbieri-12a.p...

[1] https://www.youtube.com/watch?v=LSHZ_b05W7o

[2] https://arxiv.org/abs/1609.05152

[3] https://www.youtube.com/watch?v=5K4hn6cBUPU


Hey thanks! I will check this out tomorrow!


I seriously thought I was going to click on that link and see the stereotypical cliche rap:

Yo yo yo

I'm Deep Rhyme and I'm hear to say

I figured out how to rap in a whole new way!

My name's D Prime and I'm an A-I

Maybe some day I'll calculate all of Pi!

Deep Rhyme in the house, yo!

(mike drop)


To be fair, that would be a far more impressive result than this!


Linked (but a little buried) in the post is this absolute standout of a YouTube video about the rhyme scheme in Eminem's Lose Yourself.

https://www.youtube.com/watch?v=ooOL4T-BAg0

Really drives home both how much more sophisticated hip hop has become and the incredible degree of difficulty in trying to simulate such sophisticated language structures.


Eminem has similar trickery going on in most songs and that's probably one of the reasons of his success. Of course many other rappers do the same, and surely Eminem isn't the inventor of funky rhyme patterns.

It's been 15 years but I used to analyze the patterns. Brings back memories.


A rap lyrics generation algorithm with deep learning was also recently published [1] in KDD conference and it also got quite a lot of publicity in other media: see the associated website at http://deepbeat.org.

[1] Malmi et al.: DopeLearning: A Computational Approach to Rap Lyrics Generation http://www.kdd.org/kdd2016/papers/files/adf0399-malmiA.pdf


Author of DeepRhyme here. With respect to the authors of DeepBeat, what they are doing is less ambitious. They are taking full existing lines out of a rap lyric corpus and assembling it into a verse that rhymes and makes sense. There is a paper that is very similar to what I did : http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP221.pdf . It's hard to compare our models, because they don't give much output text. They trained on 1% of the data I did, so I'm a bit dubious how successful they could have been.

I am writing a followup post where I'm going to talk about previous work, I hope no one takes any disrespect.


>> It's hard to compare our models, because they don't give much output text.

The referenced paper doesn't give many examples of output, but the authors have made the system available online on the url given above.


I think you're talking about DeepBeat, they aren't tackling the same problem, as I mentioned.


Instead of completely banning actual lines from other material, it might be interesting to allow D-Prime to quote or slightly modify a phrase if it met some high threshold of notability.


First, this is awesome. Great results.

I have been working on exactly this for the past few weeks. I went ahead and produced a song and a lyric video from mine (I call it Lil' Markov). It started out because I noticed patterns in Lil Wayne song where he'd say some idiom and then rhyme it. It seemed so predictable I figured I could make a bot that automated Lil Wayne's process...

From what I can tell our methods are very similar. I have a thing that allows you to input a concept and it will try to rap around that concept and a `whiteGuilt()` function (that's whats up with all the ninjas), but other than that our process is 90% similar.

We should collaborate on something! Message me.

Version 0: https://soundcloud.com/cason-clagg/aesopxcasonxmarkov

Version 1: https://www.youtube.com/watch?v=fmcrjH6BKag

Version 2: (No music yet, bit here's some sample lyrics based on the concept "beef")

FIND MYSELF WITH MILLIONS THE PIGGIES THAT'LL SHARE HOPPIN' AROUND SATURN'S RING I'M JUICED OFF DOG DARE

A DESCRIPTIVE LYRICIST LOUNGING IN LOVE LIKE YOU AIN'T EAGER TO KILL EACH ONE OF THE DOVES

HE RAN FOR A BEAST HE'S A PARAGON OF YORE DIDN'T THINK OF HIS BLOOD ON THE FACTORY FLOOR


As a huge hip-hop head, I will just add that there are two aspects to evaluating rhymes: How it sounds when delivered by the artist, and how it reads. The best rap succeeds in both domains. Bob Dylan won the Nobel Prize because he succeeds in both domains. In this case, we obviously can only consider the second domain, how it reads. So we should evaluate the results exactly how we would evaluate any other poem (while giving due respect to the conventions and tropes of the form, ie, rap has its own classic metaphors and themes).


Don't know how good you are with cache invalidation

But your naming game is tight, that's without a question


I think you're on to something here! It might be a while before we train a NN that can out-rap the pros. But can we raise the bar for applying the style to alternative content domains to generate educational raps that aren't lame?


I wonder if approaches like this will work for poetry (or prose) from different cultures. I'd be especially interested to see if some of these techniques are effective on languages markedly different from English. (Maybe Chinese and Arabic?)


For that, they should consider the underlying universal metrics (or meters) instead of the local languages, along the path derived by Prof Nigel Fabb in the UK https://www.amazon.co.uk/Meter-Poetry-Theory-Nigel-Fabb/dp/0... of which you have a review here https://web.stanford.edu/~kiparsky/Papers/fabb_halle_review....


This is great. Well done to OP! I've been considering creating something like this in the future as well.

You mentioned that you used the lyrics from 50,000 rap songs. Did you have any filters for quality or certain sub-genres? What era were the rap songs from? Maybe the 'quality' would be higher if the data set was narrowed to what you consider quality hip hop lyrics.

Also from your post: "You can see that D-Prime does internally rhyme to a small extent, but making the model better in this regard seemed like it was going to be tough, and I didn’t pursue it." What were your main challenges to improving/pursing better internal rhyme schemes?


Hey man, thanks! I didn't try to filter to any subgenres, but I did try to give the model as easy a time as possible by trying to stick verses which stuck to the most common words (lowering the perplexity of the corpus). I would call this genre 'mediocre', and may explain why I had such a hard time with the sexism thing.

Internal rhyming / assonance and other stuff could be done by rewarding the model for using it during the beam search. I suspect it's going to be hard to tune how to set the reward so it still makes sense, it only just makes sense already. I also think at this point you would really need to sit down and figure out what rhymes and what doesn't. I did the quickest, hackiest thing I could think of, which was good enough for generating end rhymes, but if you want to have internal rhymes and not massively constrain the model, I think you need to work it out purely from analysing the syllables.


> Maybe the 'quality' would be higher if the data set was narrowed to what you consider quality hip hop lyrics.

Also consider the slang in different eras has changed over time, too.

The example I often use is the phrase "word is bond" in the 90s which was replaced by "on God" in the 00s.

I wonder if you can get D-Prime trained off separate datasets.


I have to giggle a little bit at the implications of taking source material that's just about every negative *ist word it's possible to be and then using ML to generate a version without that content.


I think this is where were supposed to invoke the intersectionality fairy.


This is awesome! I've been meaning to do something like this for awhile.. great that someone else finally did it.

Any chance you'd open source it?


For a moment there I thought that thing had learned to rhyme on its own.

Then I figured out it's got some rules that help it along.

Phew. That'd have been scary.


I really want to drop in a Shakespeare play or two into the dataset and see what comes out.

MC MacB comin for yo, Dun-king.


Mad Magazine (or maybe it was Cracked) did a pretty good rap parody of Macbeth years ago, maybe the late '80s or early '90s.


Somehow these AI projects always seem to demonstrate how real intelligence is better.


Baby steps. We all were babies once.


It would be fascinating to try training it on some HRC and Trump speeches. I am doubtful most people would be able to tell the difference between generated remarks and the real thing.


...does enough chap-hop exist to form a decent dataset?

https://www.youtube.com/watch?v=0iRTB-FTMdk


I'd like to see you team up with some AI Beat generators and then use text-to speech and produce albums.


Next, you should plug this into a text-to-speech program and play it with a simple beat generator.


Looking forward to the time

When Siri can spit

Some dope rhymes


Next stop : Pun recognizer or pun generator.


How will we know the AI versions ?


Awesome!


D-Prime is an incredibly good AI MC name

Whoever thought that one up deserves free lunch in perpetuity


No Free Lunch ;)


I'm sensing an impending gangsta job shortage. I wonder if machines become better at being outlawz, then might these individuals pursue some form of higher street education to keep up?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: