Hacker News new | past | comments | ask | show | jobs | submit login
Google 2000 vs. Google 2011 (mattcutts.com)
109 points by sjs on Feb 9, 2011 | hide | past | favorite | 62 comments



Google 2k11 is a radically better product than Google 2k:

1) Crawling is now continuous, and fast. You no longer have to wait months for an index update to see a new site on the Googles. Even pages well down the head of the importance distribution get crawled, indexed, and searchable in minutes.

2) Search in Japanese is ridiculously better. Just trust me on this. It has gone from being Hotbot2001 to (English) Google 2005 quality, which is impressive given the natural language problems Japanese throws at you, and the underdeveloped state of the link graph.

3) The Google symbiosis with particular sites on the Internet - most relevantly to my interests, Wiki and StackOverflow - has successfully incentivized the creation of sufficient basic and in-depth content that it is virtually inconceivable that you can come up with a generic English term and have a total failure on it. (e.g. the top ten for [manatees] will now exceed an eighth grade science report in informational content, which was a very hit or miss proposition in 2000.)

4) Google seems to have gotten over themselves in one critical respect: they used to vocally glorify non-intervention in the SERPs, as if G were the blind oracle of the net. This lead to truly stupid issues like a particular neoNazi group ranking at #1 for "Jew" and Google running an AdWords ad against "Jew" to explain that Google was content neutral between Jews and Nazis and in this case the Nazis just happened to be more loved by The Algorithm so there.

I don't know about internally, but on the basis of external evidence, Google is now willing in search quality on the behalf of users. This is a good thing. (Now if they were humble and cautious about wielding immense power, that would be a good thing, too. I get the persistent impression that they still think of themselves as the wee little underdogs, cuddly academics in the dorm room, rather than being Ma Bell.)


I can't speak for others, but my disappointment is with the margin of improvement. I know that results often sucked back in 2001, but with 10 years of progress I would hope to get better results. In the case of "buy domain name" (http://www.google.com/search?q=buy+domain+name), 4 of the top 6 results I get are How To articles, not sources to buy domains.

Google is trying to solve a hard problem, and I appreciate that. I just don't have the patience I did 10 years ago to click through all the results to find a legitimate site. If I get a set of results that look like spam, I try to refine the terms, but often just give up.

I know that Google tries to solve this problem algorithmically, but I can't help but think that mixing in a human review would immensely improve results. And by human, I mean a Google employee, not a sure-to-be-gamed community review.


I think the issue here is that for you the correct results set would be reputable domain name providers, whereas for many other people, the howto articles are in fact what they're looking for.

Without having personal search switched on, they need to hedge their bets. (note: not having tried it I don't know how effective the personalized search is) In general these populist algorithms may tend to skew the results against techies, or those who 'know what they want', in favour of overall increase in perceived quality.

As far as I know there is a human component, and there is definitely significant weight these days towards CTR as determining quality. These would likely just reinforce the populist ranking factors.

One of the underlooked aspects of Google is the sidebar - using the 'fewer shopping sites' and time-based queries can be a huge help to relevance. I've for years used search modifiers (+, .., - etc.) in my initial queries with an expectation of what Google might expect. A search like "Buy Domain Names" still retains a lot of semantic ambiguity that their algorithm has to wrestle with. You have the tools to fix that yourself.

As such I tend to agree with Matt on this. We've just forgotten how bad it was.


for many other people, the howto articles are in fact what they're looking for.

That's a fair point, but one of the top 6 when I searched was from fourhourworkweek.com, so a whole other level of bias set in on my part. :)

We've just forgotten how bad it was

I still remember wanting to go to hotbot.com and accidentally typing hotbox.com. In a room full of people. So, some of my memories of early engine use are actually quite vivid.


Try misspelling a few other well known domains. That tactic is still in full force. I.e. http://goggle.com/ (hint: this is not google, so fill it in at your own risk).


I think most people searching for "buy domain name" (and not something more specific like "where to buy domain name" or "domain registrars") are probably looking for introductory info on how to go about purchasing their first domain.

In that context, these results could actually be considered better for most people.


From my results with that search, only 2 of the top 6 are How To articles, and they are the last two of the six.


While I also agree that Google's search results have gotten better, I just feel (this is very subjective of course) that they should be far better today than they currently are. I don't think they're leaps and bounds better and I think they should be.

I'm not one to constantly point to yesteryear and claim they were better but I do look at where they are today and think they should be further along than they are.

I guess when I look at certain segments of technology in 2000, some other companies have stayed more focused on their primary goal and have made phenomenal achievements since then. For example, OS X wasn't released to consumers until 2001 and the iPod didn't exist. Yet look where we are today. I have an iPhone in my pocket with incredible computing power, a very good refined interface, and a fantastic OS. Apple has stayed very focused on OS and consumer device design and pushed the field forward very far, with everyone else scrambling to catch up. This is their main business, and while they have other side projects (like the AppleTV) they stay focused on their main thing which is making great devices for consumers.

While Google, whose primary business was search+, now operates in many different business segments. Their focus has widened greatly and now they have email, payment processing, online document editing, 2 operating systems, automated cars, and the list goes on and on. Now, I have great love for many of these products and services so I do not suggest that it was a bad idea for them to do them, but I think it is apparent that search has suffered as a consequence.

Had Google stayed more focused on search instead of all these side businesses, search could be far more developed than it currently is today. I think they got complacent while being on top and tried to find other places to allocate their resources. Which, from a business perspective, I understand, but from a search perspective it seems tragic.

Therein lies my disappointment with Google, not that they haven't been innovating or improving search, they have. But they have spent so much time and resources on other things that search is not as far along as it should be.

+ Certainly it could be argued that their primary business is and always has been advertising and search is just one of the many vehicles for pushing said advertising, but the word Google is now synonymous with search and, marketing speak aside, I consider it their primary focus.


>> Had Google stayed more focused on search instead of all these side businesses, search could be far more developed than it currently is today.

I agree. Basic text search could be better, but if Google would stay focused, they could develop other technologies to stay ahead, for example - searching for something in texts written in other languages (with machine translation), search in radio stations, search what was said in a video, etc...


Well for 10 years I think the search results have only marginally gotten better. Besides, 10 years ago, what google was doing was revolutionary. No other search engine had such vast coverage or better intent determination. Now the competitionn is intense, Bing is almost as good as Google if not better in some scenarios, other guys are blekko are coming up with different approach (and have a better way to handle spam by avoiding it completely).

I think Google is spreading itself too thin. Search is what they do the best and its their bread and butter. They should stay focused on it.

Google is too engrossed in thinking about generating more ad dollars rather than making their core technology better.


      Bing is almost as good as Google
It's easier to be a follower (or ahem, copy search results). Also if I'm going to switch away from Google it must be something truly better (like the difference I experienced when trying out Google after using Altavista).

      and have a better way to handle spam by avoiding it completely
I like Blekko, but avoiding spam is an impossible dream. If there are search engines (or any filtering method, automatic or based on peer reviews) out there with less spam, that's because right now only Google matters. The only way to fight it is to be a moving target, which is what Google is doing.

Unfortunately spammers are also very creative, so it's a tough battle.


Re: bing copying Google, it has already been profusely discussed all over the internet and has been shown as nothing more than a PR stunt by Google folks to avoid discussing spam

Re: fighting spam, its a tough problem indeed but so was indexing the ever expanding web. Both Bing and Google seem to have solved the indexing problem fairly well by now.

The issue with the the spam problem is not so much of how difficult it is to solve but the motivation. Clearly the spam generates a lot of cash for Google (and some for bing too) so they are less likely to be motivated to solve it.

Smartest of the minds are at work at Google so I find it hard to believe that they cannot solve it (or even make a very good attempt at it). It's their intent to not solve it and that's the reason I think they are trying to avoid discussing it.


Re: bing copying Google, it has already been profusely discussed all over the internet and has been shown as nothing more than a PR stunt by Google folks to avoid discussing spam

I'm assuming the downvotes are because of that part.

I keep seeing the argument that "Microsoft didn't answer all of Google's claims," but no supporting evidence. If anyone has some, I'd be curious to see it. (This is a legitimate request: I haven't been keeping score.)


At the risk of beating a dead horse, I'll try to summarize. Google believed that queries and clicks on Google search results were being used in Bing's ranking, so we ran an experiment. The experiment confirmed that clicks on Google search results are used by Microsoft in Bing's search rankings.

Things we don't know include:

- the degree to which those clicks are used in Bing's rankings.

- whether MSFT does Google-specific processing of the clickstream data they get, or whether clicks on Google are treated the same as some-random-website.com.

- how long clicks on Google search results have been used in Bing's ranking (months, years, etc.).

Those are open questions that Microsoft could best answer.


Thanks for the reply, Matt.

Google believed that queries and clicks on Google search results were being used in Bing's ranking

I thought it was the search results that Google accused Microsoft of stealing. I guess I need to go refresh my memory.

Things we don't know include...

Good summary, thanks. I don't guess anyone is very surprised that Microsoft isn't quick to offer up how it treats a single data stream.


"I thought it was the search results that Google accused Microsoft of stealing."

I'm trying to use very precise/neutral language to avoid the "copying/cheating" brouhaha. The crux of the issue is that clicks on Google search results are used in Bing's search rankings.


Really glad to see you reply here Matt. Never expected it. Kudos!

I think Microsoft did acknowledge using clickstream data as one of the 1000 signals for their relevance ranking. Now regarding the degree to which they are used, I don't have any internal knowledge of either MSFT or GOOG but from my understanding of how search works, words like "hiybbprqag" couldn't have existed in MSFT's index to return any results. So the only signal that could possibly contribute to the ranking of "hiybbprqag" was the clickstreamm data that was generated by GOOG sting operation. So at least for that query term it must have been a big deciding factor. For the popular query terms, I doubt it's that high. I use both bing and google fairly regulalrly and I can tell that for most typical queries, bing results are comparable to Google but for the tail queries, google does a better job. If they were giving any special weight to google click data then they would have done it for tail queries (which they apparently cannot do as well as google) to improve their relevance there.

Regarding Google specific processing, again bulk of the search activity in US happens in GOOG, so I can imagine a lot of search click data is generated on google.

As for the time, I am not sure why that even matters. From my usage of bing and google over past few months, bing has been consistent in the behaviour. If this was a sudden change, I'd have seen a jusp in bing quality overnight. Don't remember seeing that.


       Smartest of the minds are at work at Google 
       so I find it hard to believe that they cannot solve it
It is true that they may have an incentive to not solve it.

On the other hand, this is not the same as fighting email spam. If Google were to ban content farms, since they are so popular they could be hit with a lawsuit faster than you could say "ehow".

Sure, Blekko can do it, but "Blekko what?" if you know what I mean.

Also, since I'm speaking about content farms: lots of people read junk like that, including me from time to time when I can't find something better for my search. So "spam" lately is kind of relative to each demographic.

As I said: spammers are very creative and innovate a lot ;)


I was skeptical when Google Instant was going to "change the way people search" -- but in early retrospect, I think it really has for me. I now click on one result maximum (and often zero), relying instead on rapid search-term tweaking.

Even just a few years ago I'd expect to click through several results, or heaven forbid to the next page, for what I felt was a difficult query.

Between google instant and other improvements, I am increasingly trained to look ONLY at the top several results. When google doesn't deliver, the results seem worse then they used to when they aren't. I think we're forgetting how we used to have to search.


I think Groucho Marx could sum up this article as "Who are you going to believe, me or your own eyes?"

I didn't keep a record of my searches from 2000, but I do remember that I was extremely impressed and satisfied with google back then. That is no longer the case. I am frustrated on daily basis by the results of many searches.

It's not really your fault though. I blame it on mono culture. Google has such a huge hold of the search market that it's probably not even worth SEO peoples time to bother gaming other search engines. There is a whole industry feeding off your success and we all suffer for it.

After the china attacks earlier this/last year google put out a decree that they would phase out windows machines among their employees. I thought it made sense. Windows is too easy a target because of it's success. The same is true in nature and the same is true in search engines.

The best we can hope for is a new search engine that will be insignificant enough in terms of market share to avoid the scammers. As long as google is the dominant search engine it will never get better.


> I am frustrated on daily basis by the results of many searches.

I have some speculation on this. Are you sure that your frustration comes from unusually poor results? Could it be that we're so used to things being perfect now, that a page of mediocre results looks like the end of the world?

Look at 2000. No Shazam. If you heard a song on the radio, that was the end of the story. There were some services you could call for $5 and a human would help you identify it. Now in 2011 you press a button and wait 5 seconds.

The web had a lot less information on it. A few hundred million people were connected. Now, in 2011, a few billion people can connect. Huge shift in quantity. Very tough to keep the S/N going strong.

2000: No Wikipedia. Now in 2011 if Wikipedia isn't in your top results you might be upset, but in 2000 you were happy even though Wikipedia didn't yet exist!

So I pose the question to you: Are you absolutely, positively sure that Google's quality has declined and what you are seeing isn't just a side-effect of everything else being so awesome?


> Very tough to keep the S/N going strong.

I think there's also a problem that the quality of content that Google has to index is steadily declining. In the 90's, many people were keeping lists of links to good stuff they found on the internet. Google could crawl the links and make conclusions. Now, people hardly do that anymore because they can just google it.

So Google has to employ ever more powerful algorithms to maintain even the same level of search quality. I think here lies an existential threat to the search engines.


Yes you can probably whip out the Louis CK. I'll admit that part of it is acclimatization. I've come to expect good results all the time.

That doesn't detract from the fact that results are bad for many searches and that the google monoculture has thwarted it earlier success. I'm absolutely sure that my satisfaction with their product has declined. I know I look forward to a better search engine than google. In 2000 google was the better search engine.


I would say the noise ratio for the whole internet increased because people found out that they could make easy money with content-void or misleading websites.


I think what Matt is trying to say is that your memory of what Google was like in 2000 could be inaccurate. So he's demonstrating what Google was actually like in 2000.

Of course you were impressed and satisfied with Google back then — it was so much better than the alternatives. I'm sure you would have been just as (perhaps more) satisfied if 2011 Google existed in 2000. We've had 11 years to pinpoint the deficiencies of Google. Couple that with 11 years of rapid progress on the internet — we just expect more now.

What's confounding to me is that there's an obvious solution to Google's biggest problem (spammers and scammers):

  1. Allow users to individually block URLs and domains on a permanent basis.
  2. Accumulate massive amounts of data regarding blocked URLs/domains (a good quality indicator)
  3. Integrate this data into the PageRank algorithm
I'm not sure why it's taking Google so long to do this, at least step 1.


A reason could be that it would slow down searches too much. Firstly, all your searches would have to hit a server storing your blacklist. Secondly, it makes caching results of popular queries neigh impossible. Finally, some people will build blacklists so large and convoluted that the top hit on some of their searches would only be on Google's page 20.


One might think that personalized domain blacklisting is simpler than social search. http://www.google.com/support/websearch/bin/answer.py?hl=en&...


How about they build it in as a feature of Chrome? The servers can still send the canonical list, using cached results and all, but then it gets filtered down (and perhaps in the future reranked a bit) on the user's machine. As a bonus, it increases the value of switching to Chrome.


Aha, good explanation, thanks. I still have to think with all of Google's brainpower, they could figure something out. They already allow "starred" results, which I presume would function somewhat similarly to a blacklist from a technical standpoint (just showing instead of hiding).


Sure they could use other ranking methods like user black lists, white lists (bookmarks/favorites), timing the number of seconds until someone back buttons, etc... But as long as google garners the majority of the search market it will all be manipulated.


My experience is entirely different.

I do a lot of Perl. It used to be that as soon as something on CPAN matched my search, I'd get a handful of results, then it would find the CPAN match, then I'd get pages and pages of the same result from different CPAN mirrors. Now that no longer happens, and my searches for half-remembered Perl discussions are much, much less likely to end in frustration.

But that's me, and YMMV.


We were all very impressed with IE6 in 2001 as well and now even with IE9 that is no longer the case.


I'm not sure that a single hand picked result shows much. If Matt has 40,000 results, how about opening them so we can get an idea of how things have changed for queries we care about, and letting people do an independent review?


I guess the whole backlash against search results was Content Farms outsmarting Google (atleast temporarily) and distorting results.

Google's search results in 2000 were better than any competitor or Google itself 2 years before. But the same thing can't be said with certainty when comparing 2010's search results to 2008's.


Google's spam filter is training the internet to produce better and better content. Over time a rising tide lifts all boats.

In 2000 they were working on not returning results from various subdomains off the same base domain.

This year they're working on content farms.

Next year we won't be talking about plagiarized or duplicated content. We'll be talking about Google giving us content that is superficially different but not unique enough.

Five years from now we'll be talking about how Google is giving us unique, relevant content in the first ten results, but it's not personalized enough. Et cetera.

Just like in the AI world, every time a computer does something we thought only a human could ever do, instead of admitting defeat, you say, "This just proves that chess|poetry|music composition|playing Jeopardy doesn't take real intelligence." The truth is that you're not defining intelligence, you're moving goal posts. And if there's one thing Matt's post shows, it's that we moved the goal posts.

I think Matt likes the criticism. When Google's critics have to move to more and more sophisticated attacks, it shows that Google's facility is growing.

Google is not just raising the bar on the whole internet. You are raising the bar on Google.


I think it's about content: Google has transformed the webs' spam gardens by promoting crap sites.

Because Google pushed low substance sites high in their search results, this created a vicious cycle where sites were rewarded for creating mass volumes of inane 'content', which has now become a huge percentage of the web

Google isn't just an observer to the system, search results impact the ecosystem as a whole. A complicated environment is something that is challenging to test and to predict, although that is what Google internally is geared for.

Personally I'm also annoyed by Google's Adsense policies that emphasize inane content: (They without warning blocked me for a post of a painting that included a nude figure).


For me personally, google is much, much worse than before. I don't find what I need.

What I think is sad is that alternative search engines still work almost the same as google.

Of course he's going to find spam if he searches for the most spamable content. But that's now what I usually search for - mostly I search for programming related topics or trouble shooting. The old google would give me the right result straight away, the new google gives me some page on the main vendors site that is unrelated.

There are many ways niche search engines could come up to solve this problem, but none of them seem really interested.


Google isn't strict enough for programming queries anymore, for example this query yields no results:

    "hello-world" -"hello world"
It ignores the dash even though I'm using quotes.


Could you give us an example or two?


I don't get people who say Google is getting worse myself. If any of you know search engine building so well and aren't working for another search engine (like DuckDuckGo) why not take your expertise to Google and help fix the problem?


You bring up an interesting point. It would be nice if there was some sort of way real people could rank results. Even as an engineer, I am often frustrated by the results I get, but I have a hard time articulating what Google could do to show me what I wanted on the first page.


Please take a look in earlier threads. Google doesn't even listen to patio11 before he writes a blog post that reaches #1 on HN.


Actually, I passed patio11's blog post around widely within Google. Feedback like that is welcome because it prompts us to think about what we need to do better.


You passed patio11's blog post around before or after it hit HN front page? (If it was distributed inside Google before reaching HN front page, I was wrong.)


FYI DDG just uses the bing API and then does some fancy stuff on top of it, so technically they're not a "better search engine" they're just a prettier bing.


FYI: no, actually we are a hybrid search eninge. We have our own crawling and indexing that is merged with about 15 external APIs.


"hybrid search engine" is just worthless words, what does it even mean? If you want people to stop saying you're just syndicating others data publish the amount of others data you do use, but when I've compared bing.com searches with yours they're very similar whereas google is different, which would imply you're heavily taking from bing, whether or not you crawl yourself...


So I'm required to divulge information to stop you from spreading false claims?

You said "they're just a prettier bing." I have ~100K lines of code in front of me right now that has nothing to do with UI, although I think UI is tremendously important and has been a primary focus as well.

Nevertheless, I've talked about all of this at length numerous times -- in videos, on HN, my blog, etc. Bottom line though is you can compare for yourself. We do certainly draw on Bing, but its importance will heavily vary per query and for many queries we'll look completely different. It really depends.


False claims? It's what you've said[1][2], albeit a long time ago, but because I can't find any up to date information that you've published (care to link to it?) I can't suddenly know something else.

You're not required to do anything, you can tell me to piss off if you like, but if you want people to claim things and for them to be factually accurate you should publish this information, and if you have then please link to it because I'm struggling to find it.

If you want people to stop spreading misinformation (intentional or mistaken) then spread the correct information, otherwise learn to accept that when you say you use the bing api but refuse to explain to what extent, people will assume your site is entirely powered by it.

[1] http://www.gabrielweinberg.com/blog/2009/02/thoughts-on-yaho... [2] http://www.gabrielweinberg.com/blog/2009/03/duck-duck-go-arc...


Yes, false claims. You claimed that "DDG just uses the bing API" to get results. That is false, and I don't remember ever saying that anywhere because it is false.

Those two posts do not say otherwise. They mention that we use BOSS, which is of course true. The distinction is that we do not just use BOSS by any means.

I actually started out crawling the Web before BOSS even existed, and still crawl quite a lot. As for recent info, look no farther than the DDG FAQ, or my HN comments. This is from just 3 days ago: http://news.ycombinator.com/item?id=2184873

Note: no one was requiring you to make any claims, false or otherwise. So I still don't get why I need to divulge information to stop you from making false claims.


DDG does its own crawling in addition to using the Bing API. I'm unclear on how the meme that they don't do any crawling has gotten widespread enough to be brought up every time the site is mentioned.


Your claim that DDG uses bing API is like claiming how perfectly you know the back of your hand. You can only talk when you know some facts. And that facts can be acquired only when you work in that company.And please distinguish between fact and opinion. And your claim doesn't make any sense when the founder himself is claiming thats not true.


Well, how do you know what it does? Have you seen the code? Do you know what exactly it does? Making claims such as these without knowing the facts is certainly not professional behaviour.


Unless ofcourse Bing is a better search engine to begin with. (Well that's arguable so I concur :) )


Could it be that people who are having a hard time finding things on Google now are searching the same way they did in 2000? Google might (and arguably should) be attempting to cater to the way people are searching now, not the way they searched 10 years ago. Don’t forget that people using a search engine need to evolve as well.


It would be interesting if they set up a Google Retro and let people try and compare the old search and see if they still think it is an improvement over the newer systems.


it would be mostly useless. the content today on the web is very different from the content even a couple of years ago.


Here is what I've commented on matt's blog :

Well, you're Google is better, but, expectations are higher as is the number of Google employees. So when we see the same content farm again and again during weeks without any action being taken, it annoys. So when blaming "Google when the information just doesn’t appear to be on the web at all" is clearly too high an expectation, fixing SPAM issues, be it case buy case, in matter of days instead of weeks is reasonable to me giving the resources you're _supposed_ to have.

So, HNers, my wondering is, is the search spam team under-staffed ? or is the problem elsewhere ?


I definitely agree that search results in '11 are better than in '00, although it would be cool if more search results from '00 were given since 1/40000 isn't the most representative of examples.


There weren't any content farms in 2000, and SEO wasn't as sophisticated.

Now both are quite pervasive, and having a marked impact on the qualitative assessments of Google search results.

On the quantitative side, Cutts has recently claimed:

... according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness.

How does the increase in SEO, content farms, and other questionable results (scraped content, etc.) figure into Google's evaluation metrics?


Good question. We've talked about evaluation a little bit before, e.g. http://googleblog.blogspot.com/2008/09/search-evaluation-at-...

When we evaluate our quality or a new algorithmic change, URLs can be rated as useful, navigational, etc. They can also be rated as spam. Useful/navigational sites URLs have higher scores, while a spam rating subtracts from the score. If an algorithm change tends to rank higher-rated URLs higher, that's good. If spam tends to rise in the rankings, that's bad.

What our metrics tell us is that Google has gotten better in overall search quality in the last few months, despite also seeing an increase in spam. It's safe to say that now we're putting a lot of effort into the spam side of things to get that back down to the levels we want.


Is the post suggesting that the perception of drop in quality is only because google was much much better than other search engines back in 2000?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: