I enjoyed reading some of their results. I see some methodological issues here, ...

I enjoyed reading some of their results. I see some methodological issues here, however:

1) The test makes you distinguish between real words and a set of words they've made up in some way. As others have pointed out, some of them are pretty obvious non-words. I would expect the results to change depending on the method used to generate non-words.

2) Measuring the performance of a binary classification is a well studied problem with many metrics and approaches to quantifying performance (http://en.m.wikipedia.org/wiki/Binary_classification#Evaluat...). Subtracting the false positive rate from the true positive rate is not among them. The final score is not a consistent estimator of the fraction of words in their corpus you know.