You can estimate the literacy of a forum by choosing "annotate results with reading levels", then doing a search for site:example.com. It gives a breakdown of the fraction of pages at each reading level. Far from perfect, but interesting nevertheless.
If I understand it right, this does not really measure literacy, but the proportion of complicated words used. Mathworld if a site about math, and math uses a lot of special terms, so of course it has a high reading level.
Oh YES. This makes me very, very happy. Now, can I have it as an option in the search settings, or at least readily accessible from the left-hand gutter, please?
Pretty please? I'll be your best friend, heroic anonymous Google search engineers!
It would be awesome if they could integrate that with a translation feature, such that content in high reading level (perhaps dense with technical jargon, etc.) could be translated into a simpler, yet semantically correct version.
Searching for "xsd" (xml schema document), the actual specs only come up at the advanced level, which is pretty accurate if you've ever suffered through them. Wikipedia entries appear here too. http://www.google.com.au/search?tbs=rl%3A1%2Crls%3A2&q=x...
intermediate gives you tutes and tools.
basic has news, forums posts, youtube (comments?) and (apparently) random-ish lists
I got to thinking, how would America's universities compare? It's an interesting thought: The best universities should have the most advanced reading level.
I'm surprised I can't yet find a service that rates the reading level of books so as to guide selection by early readers (typically children, but also learners of foreign languages). Amazon? Startup?
Who do you mean by "they"? This is something that only site owners might be expected to do, definitely not Google. But even assuming there is a standard way of describing that, I doubt that anybody would care enough to include such meta-data. Algorithmic approach is so much more effective in this case.
I suppose I should have said "supplemented the search results". I disagree that people don't care (many people already do care about educational metadata).
Also there's always issues of spam when you allow site owners to add such tags. Remember how <meta keywords> tag has become useless as its ignored by almost all search engines nowadays.
RDFa, for example, allows much more expressive power than META tags did and lets you get very specific. It also allows you to model relationships you never could have expressed with a META tag. Insofar as there is a lot of value in consuming good actors, there will be an incentive to filter out the bad actors.
A new challenge for HN: Find a pair or series of words that are synonymous but have the largest variance in Advanced readers.
Bonus points if you can provide a progression of words from basic to advanced.
Advance results bring the Equus africanus asinus from Wikipedia first. Good. Basic results bring Donkey (Shrek) from Wikipedia second. Google might be onto something here .... but wait.... what is it there at number one for the basic reader... humm... "YouTube - Donkey Rapes Man". Hummm... who are our "basic" readers? ... gota think about this feature a little more.... ;) Meanwhile, a classic: http://www.youtube.com/watch?v=qRm8okHhapU
This is actually a reasonably good search. It does highlight the clashes decently well.
I would like it if Google expanded this to allow you to prioritize. Prioritizing advanced reading level documents seems like it would be of value to me.
It's worth nothing that, yes, effective communication means that the document should be readable by anyone, but it doesn't seem that Google is tagging pompous, impenetrable documents as "advanced". For example, the [donkey] search comes up with "All About DONKEYS!" [1] as an "advanced" text. It doesn't seem to be difficult to read, it is just information-rich.
The search for [cheese], however, doesn't sort the wheat from the chaff like [donkey]. Gnome's Cheese project, for example, is classified as "Basic reading level", and so is filtered out from the "advanced" search.