Hacker News new | past | comments | ask | show | jobs | submit login
Google Instant, behind the scenes (googleblog.blogspot.com)
81 points by arfrank on Sept 9, 2010 | hide | past | favorite | 45 comments



I'm amazed that users couldn't tell the qualitative difference between submitting a form and find-as-you-type. To a hacker, AJAX is a whole other experience but to Joe User, it's just a bit faster. It's something to keep in mind if your AJAX interface isn't actually faster.

Yet another unconscious programmer assumption shattered.


Google Instant is shockingly fast. It is hard to imagine a small startup creating it and having it work for a few beta users. Even harder to imagine Google scaling up their infrastructure 10-fold.


What this is really showing is how fast the underlying infrastructure already was. It had been optimized to the point that it really couldn't get any faster without speeding up the user themselves, so they did that.


I, for one, forget to appreciate the fact that one of the fastest web apps I regularly use is the one that searches the entire* internet.

*kinda


I would still love to get a good layman’s explanation of how Google and other search engines can be so fast. I’m not a developer but I know a few basic concepts. Sorting makes searching fast – but that fast?

I can imagine that if I’m searching for “ball” Google already has the PageRank sorted results somewhere on its servers. That seems easy enough. But what about arbitrary search terms? Searching for “tango durlast 1978” (a soccer ball) gives me results just as fast, does Google have all of that pre-computed somewhere?

If they did that (to pick an arbitrary number) for search terms up to 100 characters (say all 26 letters, ten digits and space) that would mean that they would have to have something like 37^100=6.6e+156 pre-computed lists of results somewhere on their servers.

That can’t be what they are doing. If they were to use only one byte per possible search term (to pick another arbitrary and completely ridiculous number) they would still need to store 6.6e+144 terabyte. Which seems kinda impossible.


1) To be able to "search (almost) the whole internet" you have to pull the content and have "almost whole internet" on your own servers.

2) You have to update your copy of the internet regularly.

3) You don't want to search all that whenever somebody types a few words. So write programs to regularly prepare indexes in advance (just like in the books) and then when the query with words comes you can "just" look up in these.

The steps above sound simple, but you can imagine that the devil is in details, that there's a lot of knowhow needed and a lot of money to do that really right. Just ask Microsoft or even some failed search company.


There are two basic ways to build a search engine.

First think of the web as a big matrix with each row being a document and each column being a term. The first step is to decide how you want to shard this matrix across multiple machines.

You can shard by row, which lets you do really complicated scoring, but you have to hit every machine for every query.

You can shard by column, which is much cheaper, but limits how sophisticated your scoring can be.

Within each machine the problems are all probably fairly familiar.


Doesn’t that matrix get big fairly quickly? Are there any tricks to keep it as small as possible?


One trick I think people underestimate the effect of is that Google caches queries. When you're searching for "Madonna", you are not causing the web index to be traversed, you're retrieving the saved search for "Madonna". Google can take a couple of seconds if you throw a truly novel query at it.

Of course, storing millions of the "common queries" isn't a joke either and that still doesn't answer the question of how they search the internet within a few seconds anyhow; I'm not trying to give a complete answer, just point one aspect out. Google Instant, by design, provides the Google Suggest queries the vast bulk of the time and I guaranteed that those are either largely or possibly even entirely pre-cached, so you're not doing 10 web searches in two seconds, you're doing 10 hash lookups in 2 seconds. Which is still damned impressive, I can't hardly keep my web site loading in less than two seconds for one hit where I work.


Major parts of that aren't true. I think something like 30% of queries have never been seen before, and Google provides low latency on those as well.


Cut him some slack, it was clearly a simplification. Also, you're wrong to declare him inaccurate.

Even if 30% of queries are truly unique (a number that people have quoted from years ago, may no longer be true) caching 70% of the queries is a big win. Also, instant search is heavily relying on Google Suggest, which by it's very nature, are queries which have been performed already, so are trivially cacheable.


I've always used lengthy, quoted search phrases, but what's really interesting is that Google's autocomplete has slowly trained monkey-me to follow the well-worn path and use the search terms it suggests.

It's a great example of using the UI to encourage a desired behavior.


So, let's just turn off caching and triple or quadruple the size of the Google cluster, eh? No big deal. I completely fail to see how you think that disproves my point.


I should probably mention that I work in search quality at Google. The part I took issue with is "Google can take a couple of seconds if you throw a truly novel query at it." which is definitely not true.

Google can't cache nearly as much as you would expect, not because the queries change, but because the results change.


It gets fantastically huge when you look at quoted searches. Sure, you can build an index of key words to documents containing them, but how do you build an index for literal text searches like "to the bridge, he said"?


Yes. It's huge.


Some typical search engine strategies are on this page (and specifically in this subsection): https://secure.wikimedia.org/wikipedia/en/wiki/Index_%28sear...


It's a combination of embarassing parallelism and clever indexing.


The internet that Google doesn't search is almost always not the internet you want to find. :)


Well, it wasn't really 10-fold. Also, it didn't just mean buying more servers, as maybe most of the work was done in software: this is another feature made possible by the recent Caffeine expansion: http://mashable.com/2010/06/08/google-rolls-out-its-new-and-...


I agree. It's an impressive interface. If this really proves to be a very desirable improvement from an end user perspective, it may take Bing YEARS to be able to duplicate the experience unless they had something like this in the works already. ie: It takes lots of coordinated effort to make something like this possible at scale. Congrats to the Google team!


Technologically it's not very difficult, IMO, once you already have search engines that are as fast as Google and Bing.

The tough part for Bing, is with fewer searches, their predictive engine isn't quite as good. But this is where the Yahoo deal will pay off.

But all they need to do is fire off predictive searches and bring those back -- and of course, each predictive search only needs to bring back ten or fewer results.

Bing already does the predictive text suggestion. They just need to send back queries along with the actual text suggestion. At worst they'd need to get some more servers. But I fully expect they'd be able to roll this out in months if they wanted to.

Although personally I'd still focus on relevance. There's so much stuff both engines suck at, I'd love for them to fix some of those holes first.


Not sure about that. Bing has far less traffic, scaling should actually be easier for them. And Microsoft certainly has resources.


I would have loved to be a fly on the wall at Microsoft when someone said - 'That's weird, have you tried google today?'

I assume some of the ajaxy Google logos have been leading up to this - especially the one that coloured in the letters as you typed.


That's not much of a "behind the scenes". Granted, this is still rather impressive. Would love to get more technical details on it.


I think the most interesting piece of that article is that they are caching per user results. So not only have they made result display dynamic, but they've made the performance unique to everyone using the system.


They capture my keyboard after I search from the menu bar of Chrome, so I can't hit backspace to return to the previous page. Even if I click outside of the box, any Google search page redirects my backspace into the search box, which also changes the results.

It's rather quite annoying.


My issue is I almost always search from the address bar of chrome, or the search box of any other browser, so I never see the "instant" effect. I can't remember the last time I actually went to google.com except when linked to see a cool logo design.


Yeah, they broke one of my search patterns: search for something, read the answer from the results. I do this almost as much as following any of the results, because for a quick answer to a question the answer is usually in the summary of the link. Now, however, I open a new tab, google loads, I type the question and hit Enter, and when I CMD-W to close the tab, Safari stops me to ask if I'm really sure I want to do this.

This seems like such a tiny thing while I'm writing it down, now, but it's frustrated me steadily today; I've context-switched out of whatever I was thinking about more than a dozen times to figure out why I was getting a popup warning. Finally I just turned it off in search settings. The barely-noticeable increase in speed just isn't worth it.


Anyone else curious how long those 50 person stand-up meetings took?


They were tightly run and limited to 15m minutes. Usually only team leads talked and there wasn't much back and forth.

It's more normal at Google to have fewer people and 30 or 60 minute mtgs.


Even by the scrum ideal of <1 minute per person, that's longer than most traditional "status meetings." Must've been loads of fun.


The couple I sat in for were 10ish minutes (I think the NBC Nightly News segment has a program manager saying "Alright, 12 minutes"). They were done so that each team reported to their team lead what they were working on beforehand, and then at the scrum, the team leads aggregated everything that was going on with their team.


Makes sense. Few scrum teams get that large where you need to start thinking about optimizations like that.


In user studies, people quickly found a new way to interact with Google: type until the gray text matches your intention and then move your eyes to the results. We were actually surprised at how well this worked—most people in our studies didn’t even notice that anything had changed. Google was just faster.

If this is the case, then what's the point of showing the search results as the user types? It seems that just having predictions is sufficient.


They tested that:

For example, we tried a prototype where we waited for someone to stop typing before showing results, which did not work. We realized the experience needed to be fast to work well.


Is that what amichail is asking? It sounds like he's wondering why they don't just show the results updating with every letter you type, while not even bothering to show the dropdown predictive list of queries you may be typing.


Cursor down + return is pretty fast.


It’s never fast enough until people can’t tell the difference. “Pretty fast” is meaningless in that context. What matters is whether it is perceptibly faster or slower.


I think the real issue is something that Google is not telling us.

Maybe most people can't figure out they can press cursor down + return.

Or maybe this is a flashy way to distinguish their product from the competition.

Or maybe this is a way to encourage users to use shorter queries (as they can see how effective they are from the search results), thus forcing advertisers to pay more for more general terms.


I vote “people can't figure out they can press cursor down + return.” And that’s a perfectly good reason for changing the UI.


Even those who can figure it out probably benefit from fewer necessary steps, allowing them to pay less attention to the process of searching and more attention to the content.


I actually like Google Instant a lot. If I don't know exactly what I'm looking for - lets say I'm looking up some obscure HRESULT error handling thing, which I was - I can basically play with my query in real-time. It makes it much faster to do this kind of digging.


It's kind of like a REPL for your search queries. Kinda.


One qualitative difference: the "I'm feeling lucky" button (which you can use if you hit right-arrow on a search suggestion) has actually become useful. You can start typing, scroll down to an entry you like, and check the snippet of the top result -- all without clicking any buttons.

This is a pretty big deal to people like me who go to great lengths to avoid using the mouse. I've not tried it much because I don't usually search from the google homepage, but maybe I'll try it now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: