> Is the Google example oversimplified, > or is my history wrong? > It's reduced...

  > Is the Google example oversimplified,
  > or is my history wrong?
  > It's reduced here to a "simple database search"
  > but I thought some of Google's novel search technology
  > was there from the get-go

The article called it "a simple database search returning ranked results based on the number of backlinks." Therein lies the rub. Larry Page came up with the idea of backlink counting, inspired I think by how in scholarly writing a measure of importance is the number of other scholarly writings that cite it. He first called it BackRub, then later renamed it to PageRank.

Of course there were other pieces of evidence in their secret sauce, and they are ever tweaking it. But I would say that backlinks made the biggest difference. In those days Hotbot and AltaVista returned a cacophony of results based on word matches in the body of the page itself. Google I believe was the first search engine that went by the text in the links of other websites pointing to it. One of Sergey Brin's and Larry Page's professors tells the story of being introduced to their prototype, searching for "Stanford," and for the first time the top result was actually stanford.edu.

A few times people exploited this. For example, someone got a bunch of their friends to link to George Bush with the link labeled "miserable failure," so that when you googled "miserable failure," the top result was George Bush. This is called google bombing.

Another part of their algorithm that proved how far off course most people, including me, thought that the Web should be organized was laid out in Sergey Brin's research pager, "Extracting Patterns and Relations from the World Wide Web." While everybody, including me, thought that the way to bring order to the chaos of the Web would be to somehow persuade more and more people to write better web pages (semantic HTML, microformats, etc.) Sergey Brin just took what he was given. As an example he talks about finding author-title pairs on the Web. He starts with a seed list of, say, five author-title pairs:

  Isaac Asimov, The Robots of Dawn
  David Brin, Startide Rising
  James Gleick, Chaos: Making a New Science
  Charles Dickens, Great Expectations
  William Shakespeare, The Comedy of Errors

and just combing the web for patterns between the elements of each pair. Instead of pristine microformats and "proper" HTML, he takes what he gets:

  <LI><B>title</B> by author (
  <i>title</i> by author (
  author || title (

and other patterns.

He uses these patterns to find more pairs, which he uses to find more patterns, and so on (http://ilpubs.stanford.edu:8090/421/1/1999-65.pdf).