Hacker News new | past | comments | ask | show | jobs | submit login
Ghent University Vocabulary Test (vocabulary.ugent.be)
43 points by zhte415 on June 26, 2014 | hide | past | favorite | 45 comments



I enjoyed reading some of their results. I see some methodological issues here, however:

1) The test makes you distinguish between real words and a set of words they've made up in some way. As others have pointed out, some of them are pretty obvious non-words. I would expect the results to change depending on the method used to generate non-words.

2) Measuring the performance of a binary classification is a well studied problem with many metrics and approaches to quantifying performance (http://en.m.wikipedia.org/wiki/Binary_classification#Evaluat...). Subtracting the false positive rate from the true positive rate is not among them. The final score is not a consistent estimator of the fraction of words in their corpus you know.


A lot of these were strange formations of regular words.

For example, I said I didn't know "symphonize and "discussible" because I have never ever seen either form used anywhere. But obviously, I know the words "symphony" and "discuss" so I can infer their meaning from the suffixes.


I tried to only answer yes when I knew what the words meant rather than guessing. I also said "bubba", "bumf" and "nonsuccess" weren't words which it disagreed with.

Most of the other things that I didn't get were from biology. "lymphoid" (guessed it might have been a word but hadn't heard it so entered no), "dabchick" etc.

"bubba" and "bumf" are interesting ones as they ask the question of where the language ends and local dialects and slang begin (or are local dialects and slang within the language in which case an exhaustive list is impossible).


Interesting, for me, to see 'bumf', because even though I recognize the word from my English dialect (Southern UK, originally), I'd mark that form as a nonword because I feel it's spelled wrong: I would write it 'bumph'. I have no idea why I would feel so strongly about that, though, because I can't imagine it's a word I've written or seen written down particularly frequently...


I don't think I've seen "bumf" written down but I've certainly heard it a few times.


The non-words threw me off the first time because I wasn't expecting them to be there. By the end of the test I felt like I didn't know enough English; scored 89%.


Yeah, same here, I got 89% of the real words, but messed up with 4-5 non-words which lowered my score to 76%.


Non native, 81%. I think the test might be easier for non-natives speaking european native languages, as they cumulate etymological understanding from several languages.


I'd also guess that non-native speakers are less likely to say "yes" to fake words. Non-native speakers are, presumably, more self-aware of the extents and limitations of their English vocabularies. Native speakers seem more likely to succumb to issues of vanity, overconfidence, or bet-hedging on this sort of test.


The non-words are pretty easy to spot. On my first try I marked any word I was uncertain of as a non-word. I got a 75% with zero false positives. I looked at all the words that I marked as non-words and saw that most of them I had suspected were real words. Then I tried it again with an increased confidence and got 90% with zero false positives.


Native speaker and did 77% and 94% the first and second times respectively. My favourite "non-words" though would have to be "meedcave" and "cunstalize". I'm not quite sure what "cunstalize" would mean, but I feel like I desperately need a meedcave.


Non native, 73%. 0 non words.

accurse, glycol, propitiation, tumescence, landlocked, klystron, squab, blithesome, lacertian, dingbat, gradate, adjudge, microsomal, latescence, intercut, aviary, semis, vie, dollarfish

But given that Shakespeare wrote all his books with about 1k5 words this test doesn't proof much :)


Non-native speaker, got 93%. But English is almost like my first language. I haven't spoken anything else for decades, even though I speak 3 other languages, and think, dream and express myself best in English.


It seems I know 90% of English language words and it is extremely unlikely that I know that many. I believe the numbers are off by a large margin. I would love to look how they arrive at those estimates.


Non Native: 69%. Took Iter as a non word. Been using that too much as a name in programming I guess. Been too trigger happy on the 'f' key with a few words I did know though.



Native; 93%.

Missed: rood, ceil, catchfly, slickens, tuberculation

Some of those non-words were gorgeous: costyhibbles, neatherden, quiffiness, concodion


Maybe they should spin their word generator off into a domain name finder?


Quiffiness looks like a perfectly cromulent word.


Can't seem to progress beyond the initial press yes or no screen, with no js errors in the console. Odd.



It seems to me, that much personal information should be transmitted over a secure connection.


Why did they make the interface so tricky? I think they're testing two things here...


They also test response time, they want to make it an easy left/right decision to record the time it takes to recognize a word.


Native, 84%. I missed 'pshaw', but 'myeah' is apparently a non-word!


same for me. however they declared "clead" to be a non-word and well, http://www.merriam-webster.com/dictionary/clead makes one wonder how well their database would score against the OED


90%. Native speaker. The non words were gems, blurishness... why did I say yes to that?


But "gems" is a word - as in "gemstones".


Haha you may have parsed bitexploder's comment slightly incorrectly.

> The non words were gems, blurishness... why did I say yes to that?

Try reading that as:

> The non words were gems. Blurishness... why did I say yes to that?


Apropos, the title spells Ghent differently from the domain name.


Ghent is the English translation of the Belgian city Gent.


Non native, 67%.


for science! Non-native: 81%

EDIT: and 0 non-words


Pretty impressive! Native speaker: 69%.

Leaned on the side of `no`: 0 non word yesses at least.


Yes given the admonishment about "heavy penalties" at the beginning I hit no on any word I was on the fence about (and accidentally when I got to "triennially"). 71% as a native speaker.


non-native : 67% It's good to hear that I know 67% of English words.


I wish they had the statistics available for people who complete the test. It would be interesting to look at how native speakers and non native speakers differ.


OK, I failed this test before I even got to it. Apparently you are only allowed to put one country in the "Where did you grow up in?" question. I don't know why they aren't interested in the verbal abilities of the large number of people whose parents weren't sedentary - e.g. military, mining industry, diplomats, aid, disaster, mega-construction etc etc.

/whine over.

EDIT: Also, no way to communicate this flaw to them (I'm not on Twitter).


Well, there is a way to communicate this flaw to them. You've just chosen not to participate in that medium.


True. I could also walk over to the place and tell them in person (after a flight and a drive etc). My thoughts are that if you are asking people to volunteer to contribute, expecting people to sign up to a third party in order to communicate is not "frictionless". After all, there is thing called email that's been around for a while and is used by, pretty much, everyone. Also post forms. The main reason for using Twitter (my belief) is to get your users to promote you for free.


Oops, you closed the whine tag too early!


I know! That's annoying me as well! ;)


Because having a multi-select plays havoc with stats.


Which is a terrible excuse for putting a fundamental flaw into the analysis. It's building a big assumption into the model that's known not to hold true in reality. Especially as the mis-represented cohort will, linguistically, be one of the most interesting - likely above average intelligence and a very different exposure to language. I suspect personally it's an oversight.

EDIT: Or, as is dawning on me, I might have missed a bit of sarcasm?


Indeed, reality has a nasty habit of confounding our models.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: