Rappers, Sorted by Size of Vocabulary

loso · on May 4, 2014

I enjoyed reading this chart but I hope it doesn't reinforce the bias that some fans have that word complexity is the only way to tell if a rapper is good or not. There are several ways to judge the strength and weaknesses of a rapper. Complexity is one of them, flow is another. Story telling ability is also another very strong in indicator. The best rappers are able to bring a mix while some are just so strong in one area that they explode no matter if they are really weak in other areas.

harryh · on May 4, 2014

To fully understand rap, we must first be fluent with its meter, rhyme, and figures of speech. Then ask two questions: One, how artfully has the objective of the song been rendered, and two, how important is that objective. Question one rates the song's perfection, question two rates its importance. And once these questions have been answered, determining a song's greatest becomes a relatively simple matter.

If the song's score for perfection is plotted along the horizontal of a graph, and its importance is plotted on the vertical, then calculating the total area of the song yields the measure of its greatness.

octo_t · on May 5, 2014

god. fucking. damn.

rips page out of book

makaveli8 · on May 5, 2014

I completely agree.

hatu · on May 4, 2014

It's kind of like putting book authors in this kind of a list. Using more words and more complex ones doesn't really say anything about the quality of the writing. Pretty interesting data-set for fun though.

throwaway13qf85 · on May 5, 2014

William Faulkner once criticised Ernest Hemingway by saying that he "had never been known to use a word that might send the reader to the dictionary."

Hemingway responded by saying, "Poor Faulkner. Does he really think big emotions come from big words? He thinks I don't know the ten-dollar words. I know them all right. But there are older and simpler and better words, and those are the ones I use."

anrope · on May 5, 2014

"Give a dog a bone, leave a dog alone..."

There's too much hate for DMX in these comments. Dude has some tracks that just ooze energy.

danielsf · on May 5, 2014

I never made the claim that complexity is better/worse. I just wanted to communicate the data point, not imply 2pac>DMX.

unfunco · on May 4, 2014

This is fascinating. I'm only a recent listener of hip-hop (primarily because of Earl Sweatshirt and Odd Future) and I'm in awe of the vernacular.

And similarly, as a boredom exercise a few weeks ago I did some lexical analysis of the song Timber (the monstrosity was being constantly played on the radio at the time) and here's what I came out with:

"83.1% of the words in the lyrics are five letters or less, 58.9% are four letters or less. The lexical density (the number of unique words divided by the total number of words, multiplied by one-hundred) is 29.1%. There is only one word in the song which has three or more syllables. Eleven people were involved with the writing of the song, each of them capable of producing just nine unique words each."

raydev · on May 4, 2014

> Eleven people were involved with the writing of the song, each of them capable of producing just nine unique words each.

I'm not sure why this is notable when you consider that lyrics are probably the least important aspect of a song intended for the top 40.

If you take a moment to listen to the melody and production, you'd probably see why it's credited to 10+ people. That song is a well-oiled machine.

unfunco · on May 4, 2014

My last sentence was intended as a satire, lyrics are obviously not uniformly distributed between writers. I completely agree with you though, the song (despite my disdain for it) is incredibly catchy, and definitely not intended to be thoughtful or thought provoking in nature.

Avshalom · on May 5, 2014

Is that with or without 'the' 'be' 'to' not that its any sort of literary accomplishment regardless but English is a terrible language for lexical density.

danielsf · on May 5, 2014

this is brilliant. OP here. We should work together.

philtar · on May 4, 2014

You must be using big words in case someone does the analysis on HN comments.

Definitions: Vernacular: ?? Lexical analysis (in this case): ratio of unique words to non unique words Lexical density: What persent of the words is unique?

unfunco · on May 4, 2014

The paragraph in quotes is copied and pasted from when I wrote that a few weeks ago, there's a definition in parentheses following lexical density. Analysis is a word that should not need a definition attached to it (you have used it yourself in your comment.)

Vernacular is commonly used in the United Kingdom. Google will provide you with a definition.

bretthopper · on May 4, 2014

Looked for Canibus near the top and wasn't surprised to find him 4th. If anyone hasn't heard of him, highly suggest listening to his older stuff such as his first Can-I-Bus, 2000 BC and Mic Club.

He raps about science and space all the time which is cool.

Here's an example of his ridiculous lyrics: http://rapgenius.com/Canibus-poet-laureate-infinity-lyrics

shawnz · on May 4, 2014

Additionally, many HN users have probably already heard Canibus rapping even if they don't know it, since he wrote the Office Space theme song. :)

Oxxide · on May 4, 2014

Always loved that song.

My personal favorite Canibus track is Master Thesis, though.

StevenNunez · on May 5, 2014

Yey! I'm not the only Canibus fan! Seriously though Mic Club is awesome.

seizethecheese · on May 4, 2014

Many here seem to be interpreting vocabulary size as a signal for quality. When it comes to rap I completely disagree. Firstly, the repetition is rap's main ingredient. I read an article a while ago where researchers found that listening to a spoken phrase that is looped activates the same part of the brain as music, which helps explain this phenomenon.

Personally, if I want food for thought I read. Rap is not an intellectual pursuit. I've been perusing rappers on this list, and the top artists have not been good at all to my ears. It seems that the best rappers are in the middle, and being on either extreme is a negative signal.

TheCoelacanth · on May 4, 2014

> It seems that the best rappers are in the middle, and being on either extreme is a negative signal.

Objectively, I don't think that is an accurate assessment. There are highly respected artists all across the chart, including the extremes. For instance, Wu-Tang Clan and its members at the upper extreme and Kanye, Snoop and Tupac at the lower extreme.

seizethecheese · on May 5, 2014

Yeah I agree, that statement was definitely a stretch. There are well respected rappers across the spectrum, which is my main point: that the interpretation of the chart as a proxy for quality is wrong.

pandler · on May 4, 2014

> Firstly, the repetition is rap's main ingredient.

Is it though? I don't dispute that sheer vocabulary size isn't a sign of quality, but that seems like a very ignorant generalization of rap.

TheCoelacanth · on May 5, 2014

Repetition is a key ingredient in all genres of music. You would be hard pressed to find many significant pieces of music that don't use repetition.

seizethecheese · on May 5, 2014

I think saying it's the "main" ingredient is debatable but it surely is very important.

cruise02 · on May 5, 2014

> Firstly, the repetition is rap's main ingredient.

Sure, within one song. But if an artist has a narrow vocabulary throughout their career, it's a sign that they're just writing the same song over and over.

danielsf · on May 5, 2014

I never made the claim that vocab size was correlated with quality.

seizethecheese · on May 5, 2014

I'm addressing what seems to be a common interpretation, not an explicit claim.

dools · on May 5, 2014

It's mostly the voice...

Aardwolf · on May 4, 2014

> Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words

Why does that suggest he knew over 100k words? Maybe it means he knew 28,829 and used all of them? Would he really know over 70,000 words he never used in his works? What would those 70,000 words be? Probably very obscure ones. How can you know that many obscure ones?

mbillie1 · on May 4, 2014

Vocabulary has for a while been considered in terms of 'receptive' and 'productive' capacity, with the assumption being that ones 'receptive' vocabulary can be larger, since it is easier to hear/read/understand a word than it is to use it correctly in reading/writing (this is not necessarily the popular opinion anymore [http://www.readingconnect.net/web/FILES/english-language-and...] but may provide the context for the claim about Shakespeare). The notion is that you are able to understand more words than you commonly use in your speech/writing, which is on some level intuitive, although of course it is an empirical question.

maaku · on May 4, 2014

Assumptions tend to break down at the extremes.

E.g. Shakespeare actually invented a lot of the obscure words he used.

MereInterest · on May 4, 2014

I imagine that it would be something similar to the German Tank Problem. (http://en.wikipedia.org/wiki/German_tank_problem ) Taking each writing as a sample of the words that are known then would allow for an estimation of total words known. I imagine that this would need to be modified to account for the non-uniform distribution of word use, but the principle would be the same.

coherentpony · on May 4, 2014

Again, I'd like to point out that Shakespeare also made up words: http://www.shakespeare-online.com/biography/wordsinvented.ht...

e12e · on May 5, 2014

Relevant video:

Kate Tempest - My Shakespeare: https://www.youtube.com/watch?v=i_auc2Z67OM

;-)

sroerick · on May 4, 2014

The 28K figure is acheived by counting multiple spellings of the same word. Shakespeare lived before dictionaries, so there was never a single standard way to spell a word.

Crito · on May 4, 2014

I'm curious if words that Shakespeare invented count. There are many words that we see first used by Shakespeare, though some of them were probably words invented during his time by others with him merely being the first to record (in documents that survived until today).

rquantz · on May 4, 2014

The latter is probably more the case. The OED, for instance, has had a bias in favor of using Shakespeare as a word's origin since that dictionary's first edition. The number of words attributed to Shakespeare in the OED has dwindled over time.

jemfinch · on May 4, 2014

Just like you or I know tens of thousands of words, but only use some small subset of them in any given work, you wouldn't expect that Shakespeare would use his entire productive vernacular in producing the limited corpus of his literary works.

nmac · on May 4, 2014

Its a nice touch including portmanteaus and 'incorrect' ebonics on the list (like "ery'day"), since authors like shakespeare, joyce and others took the same liberties with language. Arguably, that's how language develops and makes it interesting to study and think about. The OP could have easily stuck to words in the OED, kudos.

danielsf · on May 5, 2014

op here: thanks yo!

krick · on May 4, 2014

Really interesting, but not as representative as it should be. It's not clear why some have larger vocabulary than others. It could be using words like "zeitgeist" (in case of Aesop Rock) or some clever wordplay (I don't know much about hip-hop, so I can't find example for some artist from the list right off the bat, but I remember Marilyn Manson using word "gloominati" for instance) or pretty meaningless made up words like "schizzle" (in case of Snoop Dogg) or usual derivatives like "fuckedy fuck". Moreover, in many transcripts for hip-hop people write down words as they are pronounced, which can be pretty much distorted for some artists (which of course ideally shouldn't count as a "new word", but that's complicated, yeah).

While Aeson Rock and DMX are clearly extreme and not surprising at all, it's not that clear for some guys in the middle.

So, first off, for every data project sources should be provided, or at least more specific definition, how text was processed, tokenized, analyzed. Second, several more "data slices" should be provided, for instance 100 most used words which are unique for that artist compared to other artist in the list.

duney · on May 5, 2014

The example you used for clever wordplay, "gloominati," is actually considered a portmanteau word. It's the result of combining multiple words to create a new word. (I say this not to be a pedant, but because I learned the term recently and was amused that we actually have a word for it.)

krick · on May 5, 2014

Yes, portmanteaus are exactly what I was meaning by "clever wordplay". English isn't my primary language so it's hard to remember the right word sometimes, sorry for that. :)

danielsf · on May 5, 2014

OP here. Do I really need to provide all of this to satisfy the reader's ability to grasp the basic premise of the site? this isn't a thesis or academic pursuit, just comparing some rappers for fun.

I used plain NLTK token analysis on rap genius lyrics. in terms of several more data slices...I agree that there should be more cuts of the data, but you must understand the amount of time that it took me to put this together.

krick · on May 5, 2014

Of course it's entirely up to you what to provide. It would be silly of me to question that. I'm not paying you for that, so how can I demand anything? I'm just saying what I think should've been done. That's how I would do it, at least.

You see, I have a strong opinion on that any data-analytic work is pretty close to being useless if it's not reproducible. And I mean really close. I already mentioned few questions that naturally arise reading your article which are crucial to understand your results and are not addressed by the article. So, ideally, any data analysis done for the open community should provide both full dataset and full sources. Unfortunately, it's not always possible: dataset may be several terabytes long, there might be legal issues about disclosure of the data (or letting everyone know how exactly author acquired them), sources might be under NDA, whatever. In this case description should be as pedantic as possible, because some little details can change the whole meaning of statistics entirely.

By the way, NLTK allows you to do all kinds of processing, so it isn't the answer.

So what would I do? The usual course, actually: would do the work in iPython Notebook, cleaning it up afterwards, would've drawn graphs in place and printed some few slices I've said already while processing, so it would be easier to understand what actually that unique words counted might be like. While fancy d3 graphs are cool for sure, but not nearly as useful.

coherentpony · on May 4, 2014

Maybe this is just me, but it's a little unfair to compare to literary texts.

Humour me for a moment.

When an artist writes a song, he (or she) has constraints. Most rappers would like to rhyme the ends of their sentences. I know sometimes they don't (like poetry), but it's certainly pleasing to the ear to have that constraint. Artists endeavour to make their songs catchy, that's highly correlated with the gross sales of the product.

When an artist writes a novel, this constraint is not weighted quite as highly. I know Shakespeare wrote poetry, too, and to call me out on this comparison is entirely fair. That said, there's also an argument to be made for eye rhymes. Shakespeare used these a lot. Eye rhymes are words that don't rhyme aurally, but do rhyme visually. It's the story that pleases the reader, not necessarily its aural 'catchiness'. I probably made that word up. But Shakespeare made words up too. The point is, you knew what I meant.

At the end of the day these comparisons, while certainly interesting, should be taken with a pinch of salt. While I'm at it, this advice can easily be extrapolated to any dataset. Always understand there may be unknown correlations.

danielsf · on May 5, 2014

OP here: the shakespeare thing is really just a hook, food for thought rather than an academic/cultural judgement.

I also had several suggestions to use shakespeare's sonnets rather than plays, which I should have done.

and yes, this is all just pinch of salt barbership discussion :)

thinkpad20 · on May 4, 2014

Is Del tha Funkee Homosapien on this list? I'd be curious, since he has pretty non-standard lyrics.

dbrian · on May 4, 2014

The author promised Reddit that he would be added.

http://www.reddit.com/r/Music/comments/24omhw/rappers_sorted...

Xcelerate · on May 4, 2014

I'd never heard of Aesop Rock before and decided to check out some of his music. He sounds a lot like Del.

thrownaway2424 · on May 4, 2014

They are also on the same same label. Definitive Jux has some quality product.

If you're looking for expansive vocabularies you should consider exploring other dorky rappers like Scroobius Pip.

benihana · on May 4, 2014

I think I remember reading Aesop say Del influenced him a great deal. If you like Aesop, check out El-P, Cannibal Ox, Illogic, and basically anyone who was associated with Def-Jux.

habosa · on May 4, 2014

Not surprised to see Wu Tang at the top and Drake at the bottom. Started from the bottom ... still there.

pandler · on May 4, 2014

Haha I was thinking that as you move left on the scale the more likely you are to see rappers that people tend to mock.

orblivion · on May 4, 2014

This looks at the first so many lyrics in each rapper's career. Aesop Rock came out with some weird stuff right off the bat. I wonder if some of these other rappers became more sophisticated over time. Maybe an average per song would be better, or average uniques per word, would be better.

sfrank2147 · on May 4, 2014

The problem with average per song is that you "use up" words in every new song, so all things being equal each marginal song has progressively fewer new words.

plorg · on May 4, 2014

I bet you could get something insightful from plotting "unique words" versus "total words" - That might give a good idea of the amount of repetition over time, the length or quantity of output, and the total vocabulary.

danielsf · on May 5, 2014

here's what this looks like. ugly as sin as useless for comparing rappers.

http://www.mdaniels.com/vocab/scatter.png

love your other ideas – hopefully can do them later.

jebus989 · on May 4, 2014

Strange comment, you realise that's not an inherent truth of language? Unique words per song is trivial to calculate

michaelt · on May 4, 2014

If a rapper released one song using n distinct words their score would be n/1, and if they released a second song using the same set of words their score would halve, to n/2, despite the fact their demonstrated vocabulary is still n words.

In fact, if their first song used n distinct words and their second used a completely distinct set of words, but the second song was shorter than the first, their score would drop.

That would be unusual behaviour for a measure of vocabulary.

thinkpad20 · on May 4, 2014

I don't think that's what the poster meant. By "average unique words per song" I take it to mean, within each song words are only counted once, but across songs, words can be counted multiple times. So if song A had the words "I like cats" and song B had the words "I like dogs", then the average unique word count would be ((3 + 3) / 2) = 3, not ((3 + 1)/2) = 2.

jeorgun · on May 4, 2014

That's definitely one solution, but it still wouldn't quite capture it. As an extreme example, if rapper A produced 100 songs, each with exactly the same lyrics, they should surely be penalized compared with rapper B producing 100 songs with no shared words— even if rapper A's average unique-words-per-song is higher than rapper B's.

iLoch · on May 4, 2014

I agree, perhaps the 35,000 most recent words would be better.

danielsf · on May 5, 2014

OP here: the challenge is that most artists' best work is in their earlier years. I'd rather have Jay-z's first album than last, ya know?

jeorgun · on May 5, 2014

Would sorting by popularity, or critical acclaim, or something along those lines be a possibility?

randomdrake · on May 4, 2014

For those who aren't familiar with Aesop Rock, I'd invite you to give him a listen sometime. His earlier albums, in particular, have been very influential to me in many ways. Both in my artistic and professional careers.

From comments on the conditions of the working man and the condition of feeling trapped in a "j-o-b"[1]:

   "Now we the American working population
   Hate the fact that eight hours a day
   Is wasted on chasing the dream of someone that isn't us
   And we may not hate our jobs
   But we hate jobs in general
   That don't have to do with fighting our own causes
   We the American working population
   Hate the nine-to-five day-in day-out
   When we'd rather be supporting ourselves
   By being paid to perfect the pastimes
   That we have harbored based solely on the fact
   That it makes us smile if it sounds dope"

To storytelling masterpieces regarding living and dreaming[2]:

   "Look, I've never had a dream in my life
   Because a dream is what you wanna do, but still haven't pursued
   I knew what I wanted and did it till it was done
   So I've been the dream that I wanted to be since day one!"

Aesop Rock takes language and linguistics to entirely different levels than one might expect from the single genre that is hip-hop. He even challenges himself and the listeners, playing fantastic word games, for instance re-using the letters L, S, and D in odd and rhythmical ways after a mention[3]:

   "Lazy summer days
   Like some decrepit landshark dumb luck squad dog lurks sicker deluded
   Last sturdy domino lean's secluded
   Don't let stupid delusions lesson super-duty labor students
   Dragnet lifer solutions
   Daddy loved sloppy dimensions like son-daughter links
   Such determinated lepers, successfully disheveled
   Little soliders developed like serpents despite life sentence ducking
   Lemmings
   Some don't like sobriety's dirty lenses
   Some do"

And then there are just incredible gems that stick with you like[4]:

   "I don't flick neeedles like my sick friend
   I don't march like Beetle Bailey through a quick trend
   I don't frequent church's steeples on my weekend
   And I don't comment if you formulate a weak Zen"

There's a lot to explore from Aesop Rock. Should you find this type of hip-hop interesting, a decent place to start is with the label you can find these songs on, Definitive Jux[5]. Incredible talent has been on and off that label over the years. So much good stuff.

[1] - "9-5ers Anthem" - http://rapgenius.com/Aesop-rock-9-5ers-anthem-lyrics

[2] - "No Regrets" - http://rapgenius.com/Aesop-rock-no-regrets-lyrics

[3] - "The Greatest Pac-Man Victory in History" - http://rapgenius.com/Aesop-rock-the-greatest-pac-man-victory...

[4] - "Save Yourself" - http://rapgenius.com/Aesop-rock-save-yourself-lyrics

[5] - http://en.wikipedia.org/wiki/Definitive_Jux

leorocky · on May 4, 2014

I don't know man, I listened to a couple of the tracks and he definitely has lyrical skills, and I like some of the tracks, but the quotes you selected aren't very good at all, at best obvious topics with all the insight of a million college freshmen. Having said that I like "None Shall Pass" that has a really great sound.

To be entirely honest, I love rap, but not for any insight rappers have in world affairs, but for their lyrical ability. Some are very good at providing unique ways to describe their own insights about their lives but when someone starts rapping about world problems I just want to shut my brain off because it's usually pretty banal. Then with my brain off I can still at least enjoy the way the rap sounds.

randomdrake · on May 4, 2014

> at best obvious topics with all the insight of a million college freshman

Art is weird like that.

We have to remember that it isn't all about needing to learn something new from the experience. Sometimes it's just about getting something out of it.

Looking over the lists of the best songs of all time[1], we can see that there aren't a lot of incredibly insightful songs. Quite frankly, most speak of your "obvious topics" and probably don't talk about them with any sort of magnificent linguistic grandeur.

But that doesn't mean they aren't great songs and don't offer their listener an experience worth sharing and repeating for generations.

[1] - http://en.wikipedia.org/wiki/List_of_songs_considered_the_be...

nl · on May 5, 2014

Often songs - like poetry - often aren't insightful exactly, but provide a good way of communicating the emotions connected with an experience or concept.

Take Smells like Teen Spirit" (which appears on 6 of the lists linked to above): The lyrics are not particularly insightful (almost deliberately), but it captures the goalessness of disaffected youth like no other song.

Nothing Compares 2U* really is unrequited love.

Then there's U2's One (3 lists) which has lyrics that mean different things to different people, and music that can support multiple interpretations[1].

(Of course, some songs on those lists are just entertaining because they are perfect pop ("Billy Jean", "Like a Virgin") or funny ("Baby got Back")

[1] http://www.pophistorydig.com/?tag=u2-one-song-history

vacri · on May 4, 2014

I think the point is that giving lengthy quotes of the lyrics is a little pointless if the merit in the art isn't the insight of the lyrics.

qwerty_asdf · on May 4, 2014

I think the term you're grasping for is "Lowest Common Denominator."

LukeShu · on May 4, 2014

I don't disagree that there is a lot of catering to the lowest common denominator in the music industry, both in pop music and the "best songs of all time" lists.

However, I think the parent post has a point. (Now, I'm having a hard time figuring out how to effectively articulate it) The point of music isn't necessarily an insight. Listening to music is an experience, which is about how it makes you feel. Sometimes part of that is giving an insight, sometimes it isn't. Often, it is about combining an idea or concept with a performance or presentation; the idea/concept doesn't need to be insightful to be effective.

catshirt · on May 4, 2014

exactly. if you connect with art based on how "obvious" you find it, you are going to have a very shallow and boring art career.

sometimes being obvious is what makes it art in the first place. hell, some art needs to be obvious.

qwerty_asdf · on May 5, 2014

The inverse of broad appeal is the concept of The Long Tail, where there's a vast array of niche artists, that appeal to a small number of people.

https://en.wikipedia.org/wiki/Long_Tail

There's a book by the same title, written by Chris Anderson:

http://www.thelongtail.com

Prior to the internet, when advertising and media outlets were centralized, and retail businesses were distributed geographically, it was very difficult to gain a large following with niche appeal. But now that the internet has inverted the scenario, with decentralized, global exposure, and centralized market places like ebay and amazon, niche artists have a fighting chance at becoming famous within their genre.

In other words, it used to be that the only way to catch some exposure was to appeal to centralized broadcasting networks, and they only took chances on performers who were low risk. Now, with the internet, risk doesn't really matter, and mass appeal is literally measured by the size of your following. The larger your following is, by default, the more compromises you'll have made to appeal to everyone following you.

If you capture 1/2 the world as your audience, then you appeal to a broader, and more diverse audience, which has less in common with each other member of your audience, than if you managed to capture 1/4 the world. Getting half the world to agree on something, as opposed to creating something that three quarters of the world cannot relate to.

So, Aesop Rock raps about hating your boss, and many people say: "Gee, yeah, I hate my boss too! This guy's awesome!", but Kool Keith raps about Kenworths with wings, and lots of people are like: "Is he weird?" because lowest common denominator.

chaired · on May 5, 2014

There isn't a doubt in my mind that if someone came along and said what the actual grand structure and meaning of reality is, most people on the Internet would dismiss it as college stoner thoughts out of hand.

cgag · on May 4, 2014

I don't know, maybe it's because I didn't grow up in HackerNews social circles, but I always felt like I was the only one who thought the concept of a "dream job" was disgusting, and hated jobs in general. I think I'll probably enjoy that song even if me ane Aesop aren't the only two people to feel that way.

sizzle · on May 5, 2014

listen to the album 'Labor Days'

If you like this abstract hip hop then I highly suggest you delve into the artist MF DOOM.

WickyNilliams · on May 4, 2014

Aesop is an excellent lyricist. In fact all the MCs on the Rhymesayers label are very talented: Brother Ali, Slug (of Atmosphere) etc.

One MC whose vocabulary always leaves me taken aback is RA Scion, who has been part of the group Common Market. Their song, "My Pathology" [0] is a shining example:

    "Below the terra ferma's the murmur of many men
    Resonatin' the predication of RA's eponym
    It requires a higher degree of thought to transmit
    Elevate above the base and retrace the semantics
    Incommensurately we've been held incommunicado
    From commoner to commodore – they breed bravado
    I exercise authority over the lesser ranks
    We rally and tally up at the shores of the West Bank"

[0] http://lyrics.wikia.com/Common_Market:My_Pathology

Intermernet · on May 5, 2014

Thanks for mentioning Slug. It'd be interesting to see where he (and Bus-Driver) came in on the vocab scale.

Fishkins · on May 5, 2014

Slug, unsurprisingly, wouldn't be very high [0]. I've always felt like nearly every rapper he's associated with is great, but he's kind of mediocre.

I don't see any data on Busdriver, but I imagine he'd fare better.

0 - http://www.reddit.com/r/dataisbeautiful/comments/24nw9p/rapp...

sizzle · on May 5, 2014

I'd like to also recommend Stones Throw Records for those just getting into this realm of music.

MF DOOM is my favorite lyricist

e12e · on May 5, 2014

While I wasn't familiar with the label as such (I was surprised at Madlib/Otis Jackson being featured so prominently) the recent documentary "Our Vinyl Weighs a Ton: This Is Stones Throw Records" is highly recommended for those that want to get a glance at what the label is all about.

And for those that like both poetry and rap, I suggest these two artists (in every way opposite, yet similar): the legendary Gil Scott-Heron: "We Beg your Pardon":

https://www.youtube.com/watch?v=MDCfEkopryo

And young Kate Tempest (here with Canibal Kids):

https://www.youtube.com/watch?v=TUEsihgq8zU

I'm also partial to "the Streets", RZA and a lot more "mainstream hip-hop".

sizzle · on May 5, 2014

thanks for the recommendations!

e12e · on May 5, 2014

You're most welcome. And thank you for the thanks -- it wasn't entirely given that anyone would read them and enjoy them...

[edit: unless you're profile page is meant to be a riddle/tease, I think you might want to put some actual contact info in there... Just, saying (no, I don't have a start-up for you (yet anyway) :-) ]

parhamn · on May 4, 2014

Interesting comment about the L, S, and D usage and rhyming. I was particularly surprised by the effort that goes into Eminem's rap that I just contributed to "good flow". Some of that effort explained in this video: https://www.youtube.com/watch?v=ooOL4T-BAg0

carlob · on May 4, 2014

A friend of mine's theory is that Eminem has a great flow because all the vowels sound alike in his Detroit accent.

chaired · on May 5, 2014

Eminem has spoken about how much he enjoys playing with words. One of the techniques he's mentioned is taking two words that don't rhyme and bending the sound of each about half way towards each other.

pla3rhat3r · on May 5, 2014

Easily one of my favorite artists. I'm sad they didn't include more Rhymesayers Artists. I think a lot of them would be to the right of this scale. Guys like P.O.S. and Brother Ali are also very versatile.

chrissyb · on May 5, 2014

+1 for Brother Ali, love his back story too.

pla3rhat3r · on May 6, 2014

Also shocked Atmosphere is not on this list!

chrissyb · on May 14, 2014

Oh my - thanks for the heads up on this i had not heard Atmosphere before!

seltzered_ · on May 4, 2014

Found a video rendition of aesop rock's "no regrets" pretty inspiring: https://vimeo.com/14583499

" 1-2-3, that's the speed of the seed

A-B-C, that's the speed of the need

You can dream a little dream or you can live a little dream

I'd rather live it, cause dreamers always chase but never get it"

dons · on May 4, 2014

http://m.youtube.com/watch?v=VNX4spGpIOc&feature=kp I'd recommend Aes' Zodiacpuncture for vocab speed and depth. He basically wins the ranking on that track alone.

Ryanmf · on May 4, 2014

OP: Did your analysis of MF DOOM include his work alongside Madlib as Madvillian or his various other pseudonyms (King Geedorah, Viktor Vaughn, etc.)?

I find it a little hard to believe he's not at least in the Wu Tang/Canibus/KK cluster, if not #1 overall.

Tycho · on May 4, 2014

Yeah I would have though Doom would be very high. But the density of his lyrics perhaps stem more from allusions/references and humour than from the words themselves.

sizzle · on May 5, 2014

I can't take this list seriously until DOOM is at the top, I agree with you guys. Daniel Dumile is on his own level no doubt.

joefkelley · on May 5, 2014

Seriously, DOOM is in his own league. At one point in "All Outta Ale", he rhymes "3-4-methylenedioxymethamphetamine" with "oxyacetaline."

Also, probably my favorite individual rhyme of all time, from "Meat Grinder": "Borderline schizo, sort of fine tits though"

airfoil · on May 4, 2014

Agree. I was surprised not to see DOOM as well. Another MC I think would score pretty highly is Chino XL.

quux · on May 4, 2014

I wonder where Weird Al Yankovic would come in on this ranking.

coherentpony · on May 4, 2014

Weird Al's songs are not articulate masterpieces, but cheap parodies of other rap songs and rappers. He's probably somewhere around the 5,000 mark with the other artists.

A cursory google on the size of the average vocabulary [1] yields an interesting fact. I'm not sure how watertight it is. I realise it's probably unfair to compare the size of the average vocabulary to that of a series of songs. Songs being shorter for one. Still, it's interesting.

Moto7451 · on May 4, 2014

Not sure if that's fair to Weird Al. The people he parodies wouldn't really agree either [1]. It's not like he's doing the cheap morning show tactic of swapping clean words for dirty words or bad puns but leaving the rest of the song intact. They maintain a consistent theme which is really tough.

[1] http://www.weirdalforum.com/viewtopic.php?t=5673

coherentpony · on May 4, 2014

Yeah, perhaps I was a little harsh. Don't get me wrong, I like Weird Al. I probably could have phrased my comment a little better. I should have said something along the lines of, "In my experience listening to Weird Al, it doesn't feel like he explores a lot of the English language."

Your link is cool, thanks for sharing that.

DigitalSea · on May 5, 2014

Makes me very happy to see Aesop Rock in the number #1 spot. He isn't as underground as many people assume, still relatively unknown in the mainstream, but well known enough to sell records and sell-out shows. I wasn't a big fan of his 2012 release Skelethon, but the way he structures his lyrics and the meaning behind them means he never writes a bad lyric.

Interestingly Eminem whom I would have thought would rank pretty highly for his clever method of word bending and enunciation is only in the middle of the scale. Still a whole lot better than some of his counterparts, but still surprising. Another interesting thing to note is Eminem being grouped in the same league as the likes of Jay-Z, Rakim and Lupe Fiasco. With only a couple of hundred unique words separating them from one another.

xentronium · on May 5, 2014

I always thought eminem was famous for his clever wordplay, not his vocabulary diversity. FWIW, as a non-native speaker I can gather most of his verses. Aesop Rock, on the other hand, is totally indecipherable for me without printed lyrics.

riggins · on May 4, 2014

I find it hilarious that DMX is dead last.

I've now got empirical evidence of what I always thought.

I think DMX rhymes words with themselves more than any rapper I've ever heard.

poink · on May 4, 2014

I'm pretty sure this fails to take into account DMX's rich canine vocabulary.

ziziyO · on May 4, 2014

I think Rick Ross would give DMX a run for his money. I've heard him rhyme a word with the same word before (Atlantic).

ch4s3 · on May 4, 2014

I said out loud before clicking the link that DMX would likely be dead last.

jeffandersen · on May 4, 2014

X gon' Give it to ya? Nope.

aw3c2 · on May 4, 2014

Sampling bias.

ballstothewalls · on May 4, 2014

This is a great graph, but I think it would be neat if a y-axis was thrown in. My first thought was album sales or some other metric of popularity that help you find specific rappers quick instead of going through the huge bunch of little pics.

sareon · on May 4, 2014

This reminds me of a PyCon talk from this year in analyzing rap lyrics with some basic NLP techniques

http://pyvideo.org/video/2658/analyzing-rap-lyrics-with-pyth...

The author was trying to see if rappers are considered more hateful towards women by their usage of "bitch per song". The results are quite interesting.

zopticity · on May 4, 2014

Lil Jon should be at the bottom with 7 words: "Yeah!", "Okay!", "Shots!" and "Turn down for what?"

axx · on May 5, 2014

He's not, because you forgot: "WHAAAAAT?", "SKIT", "SKIT" and "SKIT!".

rthomas6 · on May 4, 2014

This infographic doesn't take into account other rappers possibly copying earlier really influential artists, making the earlier influential artists rank lower. More generally, it would be cool to see this chart ranked by the amount of original words present in the first 35,000 lyrics that were not present yet at the albums' time of publication.

ryan1234567890 · on May 6, 2014

To put some perspective on this: ryan@3G08:~/Desktop/bleh$ pdftotext David-Foster-Wallace-Infinite-Jest-v2.0.pdf ryan@3G08:~/Desktop/bleh$ python dfw.py size of vocabulary: 30725

The man passed Shakespeare by 1,896 words with that book.

code:

  import nltk
  from nltk.stem import *
  import string
  
  raw = open("/home/ryan/Desktop/bleh/David-Foster-Wallace-Infinite-Jest-v2.0.txt",'rU').read()
  
  exclude = set(string.punctuation)
  raw = ''.join(ch for ch in raw if ch not in exclude)
  raw = raw.lower()
  
  tokens=nltk.word_tokenize(raw)
  
  stemmer = PorterStemmer()
  stemmed_tokens = set()
  for token in tokens:
  	stemmed_tokens.add(stemmer.stem(token))
  
  print "size of vocabulary:", len(set(stemmed_tokens))

Tycho · on May 4, 2014

I've been wanting to do some NLP on rap genius's corpus for ages. This is a great analysis. What I had thought of is write a program to detect ghostwriting. Rappers probably have some sort of lyrical 'DNA' in the construction of their verses. How often they use certain words, number of words per line, number of unique words per song, ratio of adjectives to nouns, that kind of thing. You could probably unmask some ghost-writing secrets.

Looking at the analysis here, it's interesting to see some clustering in the results. IMO the second cluster is the sweet spot: Wu Tang's excessive invention of vocabulary is cool but probably detracts from the poetic effect. Meanwhile rappers like 2Pac are just kind of boring IMO, at least going by their lyrics alone.

dmourati · on May 4, 2014

I'm a big fan of the project and the way it is presented. Not sure why Wu-Tang features so prominently but I guess I'm okay with that. Kool Keith should be broken down further into his constituent parts. I also would have thought the Beastie Boys would have run higher.

twic · on May 4, 2014

I'm pretty sure Kool Keith would not be okay with that.

dmourati · on May 4, 2014

He'd be Kool with it.

http://www.koolkeith.co.uk/personas.html

danielsf · on May 5, 2014

OP here: many thanks!

andybak · on May 4, 2014

I would have been rather surprised not to see Aesop Rock fairly high up the list. I was reading the Rap Genius pages for a few of his tracks the other week and the sheer density of wordplay was fairly overwhelming.

It is rap for geeks though ;)

danielsf · on May 4, 2014

author here: hit me up with questions you've got.

pandler · on May 4, 2014

Any chance you would release your code for this? I'd love to run an analysis on some lesser known rappers and play around with some of the filters. Awesome project btw.

EDIT: The reason I ask is that I assume you don't have the time or desire to add every rapper every person asks you for.

danielsf · on May 5, 2014

the code is pretty straightforward, just chapter one of the NLTK book and a txt file.

nltk.org/book

kenshiro_o · on May 4, 2014

Before making this study, what were your predictions? Would have have expected Wu-Tang and GZA to be near the top? What did you expect the average to be?

It would be very interesting to do something similar for rockers too.

danielsf · on May 5, 2014

I only have hip hop data, sadly.

I didn't expect wu-tang to be on the top since it would go across 11 different vocabs. I felt like the law of averages would basically play out.

just1dude · on May 4, 2014

an artist that meets the requirements for the data, AZ. He's got 5+ albums, some of which were gold and it'd be interesting as I believe he may also be the highest selling solo artist in the top 10 aside from Ghostface

evan_ · on May 4, 2014

Did you remove hooks/choruses from the artist's 35,000 word count?

danielsf · on May 5, 2014

nope. they are in there. if it's on the rap genius lyrics page, I included it.

Nicholas_C · on May 5, 2014

Awesome work. Did you build a scraper to grab lyrics from Rap Genius?

derwiki · on May 4, 2014

What are Jay Z's stats?

[Edit] also Notorious B.I.G. :)

eieio · on May 4, 2014

Jay Z's stats are there: he's at 4,506. Pretty middle of the pack.

About Biggie:"35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don’t have enough official material to be included (e.g., Biggie, Kendrick Lamar)"

And to respond to your child comment, I'm sure the same problem (not enough material) applied to Big L.

bung · on May 4, 2014

Also Big L

Aqueous · on May 4, 2014

Greatly enjoyed the analysis but while I was reading it I felt a lot like this guy:

https://www.youtube.com/watch?v=GKlDBi0cyIA

NAFV_P · on May 4, 2014

All the rappers listed seem to be American.

Whack this through your Bowers and Wilkins:

https://www.youtube.com/watch?v=p_SQEUZomug

S_A_P · on May 4, 2014

I think the only problem I see is that some rap groups are listed as rappers. For instance beastie boys, de la soul and wu tang are listed. So there is some collective vocabulary being compared to single rappers. That said this is cool and pretty telling. From what I could see it is probably loosely couple to the intelligence of the rappers listed. I will echo the sentiments about DMX here. Looks like some shock jock rappers definitely are low on the list (too short).

rmk2 · on May 4, 2014

> "Wu-Tang Clan at #6 is fucking impressive given that 10 members, with vastly different styles, are equally contributing lyrics. Add the fact that GZA, Ghostface, Raekwon, and Method Man's solo works are also in the top 20 – notably, GZA at #2. Perhaps their countless hours of studio time together (and RZA’s mentorship) exposed each rapper’s vocabulary to one another."

At least in the case of the Wu-Tang Clan, this seems to be done on purpose and suggests that there is a strong correspondence between the individual members' repertoire on one hand and the group's vocabulary as a whole, with a presumed exchange in both direction (i.e. both as a deductive and an inductive process).

kenjackson · on May 4, 2014

This is an interesting analysis.

I love the fact that E-40 is about on par with Shakespeare. I'm sure he would take it as a compliment to be called the modern day Shakespeare.

danielsf · on May 4, 2014

author here: thanks!

htk · on May 4, 2014

"Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words"

So much for the modern Shakespeares on the list.

octo_t · on May 4, 2014

unlike shakespeare who was so high class and never made 'your mom' jokes or used any toilet humour or anything like that.

redacted · on May 4, 2014

And literally made pimp jokes:

http://poetry.rapgenius.com/William-shakespeare-hamlet-act-2...

crdoconnor · on May 4, 2014

or jokes about "nothing".

VMG · on May 4, 2014

As well as the old Shakespeare who was ranked according to the same rules.

htk · on May 4, 2014

I haven't read all his work, but I guess Shakespeare didn't use "hangn" and "hangin" as an alternative to "hanging". The author could validate the words against a dictionary, but it would still be flawed due to conjugations being counted as different words.

GFK_of_xmaspast · on May 4, 2014

What about conjugations like hang'd vs hanged to fit the meter?

koala_advert · on May 5, 2014

I keep getting this error, in Firefox and Chrome:

<Error> <Code>AccessDenied</Code> <Message>Access Denied</Message> <RequestId>3CB1F41D7DFDC794</RequestId> <HostId> wHCPzEYPDsmkMJX+YIgjU40YPrGYytHrk5B44dApi7663NkQQI0RKx9A/6EX7Iph </HostId> </Error>

dfc · on May 6, 2014

Do you have https everywhere? I kept getting this error with HTTPS Everywhere. You need to turn off the rule for AWS.

danielsf · on May 5, 2014

hm. it should be up. try it again.

dnautics · on May 4, 2014

How about a 2d visualization with a sliding 10000 word window, with the y axis as unique words out of 10k and the x aaxis time. Are there cultural trends that are time dependent? Did young mc and Del use more words than contemporary artists? Did their trends as artists follow the global trend over time?

selimthegrim · on May 4, 2014

Maybe this will help me answer that nagging question at the back of my brain: What does DJ Khaled actually _do_?

Grue3 · on May 5, 2014

Would be interesting to see how they compare to rock bands like Titus Andronicus, Fucked Up or Bad Religion.

Totient · on May 4, 2014

I wonder where things like classic rock / broadway musicals / opera / etc. fits on this spectrum.

I really appreciate including Shakespeare and Moby Dick on the spectrum, but I'd still like some more perspective. For that matter, I wonder how many unique words I use every day.

tokipin · on May 4, 2014

Just a note, those artists don't necessarily use all their vocabulary. Eminem for example clearly holds back on his vocabulary. Rap is as much an art as anything can be so there are all sorts of factors. Be careful what you might want to draw here other than curiousity.

x3ro · on May 4, 2014

> Eminem for example clearly holds back on his vocabulary.

What makes you say this?

richardlblair · on May 4, 2014

He would spend hours studying the dictionary.

likeclockwork · on May 4, 2014

I'm sure the answer to this question is "He looks more articulate than the average rapper."

omegaham · on May 4, 2014

Gibes regarding racism aside, some people seem more articulate due to the fact that they carry themselves differently during interviews.

Of course, many musicians will keep a persona going during interviews as well, so it's still not a very reliable metric.

The most extreme example I've seen was Marilyn Manson, but there are plenty of musicians who rap / sing about really inane stuff and then show that they're way smarter than the way they present themselves with their music.

matwood · on May 5, 2014

there are plenty of musicians who rap / sing about really inane stuff and then show that they're way smarter than the way they present themselves with their music.

As mentioned in the article, Jay-Z even raps about doing just that.

sbierwagen · on May 4, 2014

Cool to see Canibus so high in the rankings.

It'd also be cool to add the members of AOTP to the analysis.

tern · on May 4, 2014

I would love to see this analysis without filters. Who is the rapper with the largest vocabulary? What does the distribution look like at the top? Surely Antipop Consortium or MF DOOM have larger vocabularies than Aesop for instance.

Fishkins · on May 4, 2014

MF DOOM is on the list. He's above average, but well below Aes or most of Wu-Tang. I've listened to a fair number of rappers, and I was pretty confident Aes would be at the top of this list.

I agree it would be cool to see a list of all rappers, though. I was surprised not to see Del, and maybe there is some more obscure rapper I'm not thinking of with a broader vocabulary than Aes.

gfody · on May 5, 2014

I'm pretty sure E-40 scored so high because of all the made up words. He's highly regarded for being innovative and influential but you know for every piece of slang that stuck there's like ten that didn't.

ff10 · on May 4, 2014

Really surprised MF Doom is not ranked higher – are his side projects included?

oakaz · on May 4, 2014

Why Jedi Mind Tricks is not counted? He'd be the first in this list; https://www.youtube.com/watch?v=TlZgiK6FiO0

VeejayRampay · on May 4, 2014

Jedi Mind Tricks is not one person. It's two rappers (Vinnie Paz and Jus Allah).

pandler · on May 4, 2014

Jus Allah has been in and out of JMT. http://en.wikipedia.org/wiki/Jedi_Mind_Tricks

oakaz · on May 4, 2014

It doesn't matter, they are still rappers and I bet they'd be one of the top

VeejayRampay · on May 6, 2014

You're right. And I agree, they most likely would.

Army of the Pharaohs would be up there as well.

Mikeb85 · on May 4, 2014

Not particularly surprised at the list. Aesop Rock, the whole Wu-tang Clan, and guys like Nas, Wale, all near the top. DMX and Too Short at the bottom...

Definitely comes out in their music...

oinksoft · on May 4, 2014

  > Definitely comes out in their music...

That's right, Too $hort's music is laser-focused!

jarnix · on May 4, 2014

How many words in "fo shizzle ma nizzle" ? 4 or 0 ?

danielsf · on May 5, 2014

camus2 · on May 4, 2014

I would love the same chart but sorted by vulgarity.

ignacioelola · on May 4, 2014

I would love to see the same analysis across different music styles. How compare vocabulary size of Madonna, Bob Dylan and Justin Bieber?

zerohm · on May 5, 2014

Yeah, I'd like to see where The Clash rank.

danielsf · on May 5, 2014

I only have rap data, sadly :(

ignacioelola · on May 6, 2014

Matt, I can very easily gives you corpus for other artists, I'll send you an email

bladecatcher · on May 5, 2014

I would like to see Dälek included in the study. I'd be surprised if they didn't show up on the far right on the scale.

konceptz · on May 4, 2014

What I would like to see, is this same comparison done against album sales with the implication of mainstream vs. underground.

danielsf · on May 5, 2014

I kept this focused on vocab so that the data viz was very straightforward and easy to digest/draw insights from. I've had many requests for album sales to be added, and I plan to as soon as possible :)

b3b0p · on May 4, 2014

Was it mentioned where the data was sourced from? I'm not seeing anything and I went back and checked. Did I miss it?

mryan · on May 4, 2014

> All lyrics are provided by Rap Genius, but are only current to 2012. My lack of recent data prevented me from using quite a few current artists.

jomtung · on May 4, 2014

Killah Priest should be grouped with Wu-Tang.

danielsf · on May 5, 2014

OP: he's an associate, not a member

zeppelinnn · on May 4, 2014

This is awesome. Reminds me of all the data viz they are doing on rapgenius. You forgot Atmosphere though (Slug)

m_mueller · on May 4, 2014

I'd be interested in how Nerdcore rappers compare to this, such as MC Frontalot or Professor Elemental.

xkarga00 · on May 4, 2014

Have you checked out MC Paul Barman?

m_mueller · on May 4, 2014

Not yet, thanks for the hint.

devindotcom · on May 4, 2014

Couldn't find Aceyalone - I thought he'd be in the top 10, I guess he wasn't included.

crusty · on May 5, 2014

Top 10? I would put money on him being at least #2 and giving Aesop Rock a run for his money for #1 (depending on which albums his 35,000 words fall on - he's got a solid discography). I just put on A Book of Human Language(https://www.youtube.com/watch?v=lwVNp42l3Xo - full album or https://www.youtube.com/watch?v=GnsCO0Fxw3A - solid song selection). Along with that, Wu Tang as the only crew analyzed? how about Freestyle Fellowship (Aceyalone's crew), Quannum Collective, Jurassic 5.

I'd also like to see Mos Def on the list, along with everyone from Quannum, and the Soulsides and SoundBombing volumes.

Also, (unique words : total words) might be an interesting scoring method, and would allow comparison over entire works regardless of their individual volumes of output. Or choosing a random sample of # of words as opposed to first # of words, as someone who started publishing as a young buck may take a hit for early immaturity.

officialjunk · on May 5, 2014

Mos def is in there

crusty · on May 5, 2014

thanks, must have missed him

russelluresti · on May 5, 2014

Aceyalone would be a great addition. I'd also like to see Chali 2na and the whole Jurassic5 crew.

thegasman · on May 4, 2014

No mention of MF Doom? Metalface? Doom? Victor Vaughan? (All the same gentleman from LA)

joshschreuder · on May 4, 2014

MF DOOM is on there pretty much right on the Shakespeare line. It's not clear if it includes his work as Danger / Villain / etc.

xkarga00 · on May 4, 2014

Actually he is from UK

tps12 · on May 5, 2014

So funny comparing this to the same graph they did for pop lyricists.