Ghent University Vocabulary Test

sbirch · on June 26, 2014

I enjoyed reading some of their results. I see some methodological issues here, however:

1) The test makes you distinguish between real words and a set of words they've made up in some way. As others have pointed out, some of them are pretty obvious non-words. I would expect the results to change depending on the method used to generate non-words.

2) Measuring the performance of a binary classification is a well studied problem with many metrics and approaches to quantifying performance (http://en.m.wikipedia.org/wiki/Binary_classification#Evaluat...). Subtracting the false positive rate from the true positive rate is not among them. The final score is not a consistent estimator of the fraction of words in their corpus you know.

ForHackernews · on June 26, 2014

A lot of these were strange formations of regular words.

For example, I said I didn't know "symphonize and "discussible" because I have never ever seen either form used anywhere. But obviously, I know the words "symphony" and "discuss" so I can infer their meaning from the suffixes.

josephlord · on June 26, 2014

I tried to only answer yes when I knew what the words meant rather than guessing. I also said "bubba", "bumf" and "nonsuccess" weren't words which it disagreed with.

Most of the other things that I didn't get were from biology. "lymphoid" (guessed it might have been a word but hadn't heard it so entered no), "dabchick" etc.

"bubba" and "bumf" are interesting ones as they ask the question of where the language ends and local dialects and slang begin (or are local dialects and slang within the language in which case an exhaustive list is impossible).

jameshart · on June 26, 2014

Interesting, for me, to see 'bumf', because even though I recognize the word from my English dialect (Southern UK, originally), I'd mark that form as a nonword because I feel it's spelled wrong: I would write it 'bumph'. I have no idea why I would feel so strongly about that, though, because I can't imagine it's a word I've written or seen written down particularly frequently...

josephlord · on June 27, 2014

I don't think I've seen "bumf" written down but I've certainly heard it a few times.

sosuke · on June 26, 2014

The non-words threw me off the first time because I wasn't expecting them to be there. By the end of the test I felt like I didn't know enough English; scored 89%.

will_work4tears · on June 26, 2014

Yeah, same here, I got 89% of the real words, but messed up with 4-5 non-words which lowered my score to 76%.

_ntka · on June 26, 2014

Non native, 81%. I think the test might be easier for non-natives speaking european native languages, as they cumulate etymological understanding from several languages.

jonnathanson · on June 26, 2014

I'd also guess that non-native speakers are less likely to say "yes" to fake words. Non-native speakers are, presumably, more self-aware of the extents and limitations of their English vocabularies. Native speakers seem more likely to succumb to issues of vanity, overconfidence, or bet-hedging on this sort of test.

reedlaw · on June 26, 2014

The non-words are pretty easy to spot. On my first try I marked any word I was uncertain of as a non-word. I got a 75% with zero false positives. I looked at all the words that I marked as non-words and saw that most of them I had suspected were real words. Then I tried it again with an increased confidence and got 90% with zero false positives.

Patrick_Devine · on June 26, 2014

Native speaker and did 77% and 94% the first and second times respectively. My favourite "non-words" though would have to be "meedcave" and "cunstalize". I'm not quite sure what "cunstalize" would mean, but I feel like I desperately need a meedcave.

xchip · on June 26, 2014

Non native, 73%. 0 non words.

accurse, glycol, propitiation, tumescence, landlocked, klystron, squab, blithesome, lacertian, dingbat, gradate, adjudge, microsomal, latescence, intercut, aviary, semis, vie, dollarfish

But given that Shakespeare wrote all his books with about 1k5 words this test doesn't proof much :)

super_mario · on June 26, 2014

Non-native speaker, got 93%. But English is almost like my first language. I haven't spoken anything else for decades, even though I speak 3 other languages, and think, dream and express myself best in English.

fecund · on June 26, 2014

It seems I know 90% of English language words and it is extremely unlikely that I know that many. I believe the numbers are off by a large margin. I would love to look how they arrive at those estimates.

impy · on June 26, 2014

Non Native: 69%. Took Iter as a non word. Been using that too much as a name in programming I guess. Been too trigger happy on the 'f' key with a few words I did know though.

debugunit · on June 26, 2014

See also https://news.ycombinator.com/item?id=7949183

didgeoridoo · on June 26, 2014

Native; 93%.

Missed: rood, ceil, catchfly, slickens, tuberculation

Some of those non-words were gorgeous: costyhibbles, neatherden, quiffiness, concodion

alexandros · on June 26, 2014

Maybe they should spin their word generator off into a domain name finder?

viraptor · on June 26, 2014

Quiffiness looks like a perfectly cromulent word.

Lord_DeathMatch · on June 26, 2014

Can't seem to progress beyond the initial press yes or no screen, with no js errors in the console. Odd.

yread · on June 26, 2014

related http://testyourvocab.com/

djf1 · on June 26, 2014

It seems to me, that much personal information should be transmitted over a secure connection.

nodata · on June 26, 2014

Why did they make the interface so tricky? I think they're testing two things here...

koffiezet · on June 26, 2014

They also test response time, they want to make it an easy left/right decision to record the time it takes to recognize a word.

fredley · on June 26, 2014

Native, 84%. I missed 'pshaw', but 'myeah' is apparently a non-word!

theophrastus · on June 26, 2014

same for me. however they declared "clead" to be a non-word and well, http://www.merriam-webster.com/dictionary/clead makes one wonder how well their database would score against the OED

bitexploder · on June 26, 2014

90%. Native speaker. The non words were gems, blurishness... why did I say yes to that?

Smaug123 · on June 26, 2014

But "gems" is a word - as in "gemstones".

Dragonai · on June 26, 2014

Haha you may have parsed bitexploder's comment slightly incorrectly.

> The non words were gems, blurishness... why did I say yes to that?

Try reading that as:

> The non words were gems. Blurishness... why did I say yes to that?

judk · on June 26, 2014

Apropos, the title spells Ghent differently from the domain name.

impy · on June 26, 2014

Ghent is the English translation of the Belgian city Gent.

namenotrequired · on June 26, 2014

Non native, 67%.

yread · on June 26, 2014

for science! Non-native: 81%

EDIT: and 0 non-words

hornd · on June 26, 2014

Pretty impressive! Native speaker: 69%.

Leaned on the side of `no`: 0 non word yesses at least.

pc86 · on June 26, 2014

Yes given the admonishment about "heavy penalties" at the beginning I hit no on any word I was on the fence about (and accidentally when I got to "triennially"). 71% as a native speaker.

sean_the_geek · on June 26, 2014

non-native : 67% It's good to hear that I know 67% of English words.

anathebealio · on June 26, 2014

I wish they had the statistics available for people who complete the test. It would be interesting to look at how native speakers and non native speakers differ.

coriny · on June 26, 2014

OK, I failed this test before I even got to it. Apparently you are only allowed to put one country in the "Where did you grow up in?" question. I don't know why they aren't interested in the verbal abilities of the large number of people whose parents weren't sedentary - e.g. military, mining industry, diplomats, aid, disaster, mega-construction etc etc.

/whine over.

EDIT: Also, no way to communicate this flaw to them (I'm not on Twitter).

pc86 · on June 26, 2014

Well, there is a way to communicate this flaw to them. You've just chosen not to participate in that medium.

coriny · on June 26, 2014

True. I could also walk over to the place and tell them in person (after a flight and a drive etc). My thoughts are that if you are asking people to volunteer to contribute, expecting people to sign up to a third party in order to communicate is not "frictionless". After all, there is thing called email that's been around for a while and is used by, pretty much, everyone. Also post forms. The main reason for using Twitter (my belief) is to get your users to promote you for free.

Luc · on June 26, 2014

Oops, you closed the whine tag too early!

coriny · on June 26, 2014

I know! That's annoying me as well! ;)

codeulike · on June 26, 2014

Because having a multi-select plays havoc with stats.

coriny · on June 26, 2014

Which is a terrible excuse for putting a fundamental flaw into the analysis. It's building a big assumption into the model that's known not to hold true in reality. Especially as the mis-represented cohort will, linguistically, be one of the most interesting - likely above average intelligence and a very different exposure to language. I suspect personally it's an oversight.

EDIT: Or, as is dawning on me, I might have missed a bit of sarcasm?

judk · on June 26, 2014

Indeed, reality has a nasty habit of confounding our models.