This is great. Well done to OP! I've been considering creating something like this in the future as well.
You mentioned that you used the lyrics from 50,000 rap songs. Did you have any filters for quality or certain sub-genres? What era were the rap songs from? Maybe the 'quality' would be higher if the data set was narrowed to what you consider quality hip hop lyrics.
Also from your post: "You can see that D-Prime does internally rhyme to a small extent, but making the model better in this regard seemed like it was going to be tough, and I didn’t pursue it."
What were your main challenges to improving/pursing better internal rhyme schemes?
Hey man, thanks! I didn't try to filter to any subgenres, but I did try to give the model as easy a time as possible by trying to stick verses which stuck to the most common words (lowering the perplexity of the corpus). I would call this genre 'mediocre', and may explain why I had such a hard time with the sexism thing.
Internal rhyming / assonance and other stuff could be done by rewarding the model for using it during the beam search. I suspect it's going to be hard to tune how to set the reward so it still makes sense, it only just makes sense already. I also think at this point you would really need to sit down and figure out what rhymes and what doesn't. I did the quickest, hackiest thing I could think of, which was good enough for generating end rhymes, but if you want to have internal rhymes and not massively constrain the model, I think you need to work it out purely from analysing the syllables.
You mentioned that you used the lyrics from 50,000 rap songs. Did you have any filters for quality or certain sub-genres? What era were the rap songs from? Maybe the 'quality' would be higher if the data set was narrowed to what you consider quality hip hop lyrics.
Also from your post: "You can see that D-Prime does internally rhyme to a small extent, but making the model better in this regard seemed like it was going to be tough, and I didn’t pursue it." What were your main challenges to improving/pursing better internal rhyme schemes?