Hacker News new | past | comments | ask | show | jobs | submit login
Larry Page on Real Time Google: We Have To Do It (readwriteweb.com)
37 points by Anon84 on May 19, 2009 | hide | past | favorite | 25 comments



When I watched the LOST season finale last Wednesday I was trying to find out the answer to the question "What lies in the shadow of the statue" right after the show ended. (During the show, the answer was spoken in a different language)

Google real-time search had picked up the answer from both a TV forum and Yahoo! Answers in about 5 minutes and it took Twitter about 55 minutes before anyone had an answer in the search results. I felt a bit let down by my first usage of "real time" search from Twitter.


All hail John Locke.


Please don't do this. Google is already useless enough as is. New HN comments get indexed in like five minutes, but then they're gone in a month or two. Same with all content lately, it gets indexed really fast and then it's gone. It's gotten to the point where I'm looking for stuff I know exists and I know the keywords used on the page, and I still can't find it because Google just doesn't index that page anymore even though it's relatively popular. (For what it's worth, I was looking for a lecture that IIRC was submitted here a while ago that was basically a defense of pure math research in university.)

I actually used Yahoo! this week for the first time in a decade because Google just doesn't return good results anymore.


I agree. I haven't used Google for months. It's so web1.0 It's not even in the cloud. It's in datacenters! Google stopped being relevant the moment twitter burst onto the scene with its paradigm shifting earthquake. Everyone I know now just tweets people asking for stuff instead of using google. That way you get 'realtime' results - also Twitter is immune to spam or gaming. It's impossible. Either that or they go straight to WolframAlpha which has largely killed off Google in recent days.


Can you explain this more? Why would HN comments vanish? Doesn't google follow all links?


HN was not designed with SEO and linkage in mind -- after a while pages can become "orphaned" after they get pushed down by new stuff. No search engine follows all links. As a practical matter they generally give up after 6-8 links away from the home page.


Google search for this site still returns years old results:

http://www.google.bg/search?q=site:http://news.ycombinator.c...

(for better results try "800 days ago" with quotes, HN strips them for some reason)


Actually today Google seems to be indexing most of the site. But usually when I do searches like:

site:news.ycombinator.com alex3917 "rule of thumb"

Only half of those posts show up, and to find the rest I need to use searchyc.com


Personal anecdote, I have no idea if it's 100% related to OP:

http://news.ycombinator.com/item?id=576727

Now google today for "80-19-1"... it's not even on the second page (and it was the 3rd result to me too previously).

I guess OP's point is that Google is "happy" with fresh content, puts it on the front page, then the evil algorithms ;) push it farther.


I have a feeling Larry is being sneaky here and trying to misdirect competitors. I don't doubt that indexing content in real time and making it searchable has some utility - see the comment on this article about some question on the t.v. show Lost - but maybe 1/100 of my searches are like that. If my search for the missing link IDA is going to be polluted by hundreds of results of people tweeting about its discovery, I'd rather not have real time search.

On the other hand, this is not an either or proposition. I am ok with this as long as Google keeps the Tweet search results separate but equal (somewhat like they keep the blog search results separate via blog search but equal in that blog posts with good pageranks do appear in search results. Although I can imagine few if any individual tweets having a very high page ranks).


2 thoughts: 1) I worked on the search engine and Larry has been saying this for years. I worked on search quality back in 2005 and even back then he was talking about indexing everything and indexing it in seconds instead of hours.

2) Search is won on the margins. Yahoo and Google do equally well on most queries, but users decide which engine is better based on how it performs over all of the types of searches they have to do. So when you take the 80/100 searches you do that are not real time and use unique keywords and you know what you're looking for Y! and G come out the same. It's on those other 20/100 that Google wins users.


Update: Hey, you're the guy who launched Google Transit! I could use your advice ( See http://www.ridecell.com/gt/about/ ). Can I email you?

I think indexing everything in seconds could definitely be a competitive advantage. I haven't tried Yahoo in years and back then it did much worse than Google. If it has improved this much, it makes sense that the competition is at the margins.

On the other hand, TechCrunch/Twitter et al's idea of real time search seems to be limited to indexing Twitter and Facebook updates as they happen. The arguments they present amount to "Someone tweeted about a plane crash from the crashed plane". I don't think indexing such tweets is going to be Google's edge in search. OTOH I can think of using such information to generate Google Alerts being very useful for some people.


Yes, in recent times I changed my searching habit a bit: I still mostly use Google, but when I want to find out what are the newest things about some topic, more often that not, Twitter Search will give better results than Google. So I think Larry is correct.

Still, Twitter doesn't do "realtime search", it does "realtime twitter search", so what Googlers ough to do will be more complex.



Larry: just buy Scoopler and be done with it.


How does Google figure out relevance in realtime? With twitter, it's user driver content with tags etc.

But with the net at large, blogs etc, this becomes difficult. Incoming links etc are hard to determine in real time (primarily because they haven't occurred yet).


The site's PR, uptime/age, update rate, uniqueness of content. By this measure HN is near the ideal: it has amazing inbound PR but does not link out much. It's been running fast and fine for ~3 years, the content is often unique, in the sense that it contains phrases Googlebot has never encountered before.

An experiment: here is a search that matches an exact phrase in this comment. http://www.google.com/search?q=%22By+this+measure+HN+is+near...

At the moment it returns nothing. within a minute or two this comment till be the first result.


52 minutes and counting. ;)


Yep. Fail. Stuff from yesterday is indexed however:

http://www.google.com/search?q=confusingly+called+copy-regio...


The quotes seem to be the problem, works without for me.


12 hours later, still no match.


It has appeared in the index after somewhere between 12 and 17 hours.


Imagine if this had happened 5-8 years ago and instead of using Twitter/FB/MySpace for the 'activity stream' it used IM statuses from all the major IM networks (ICQ, AOL, MSN, jabber etc.)

Bizaro world for sure but interesting to ponder.


I wrote exactly this 4 years ago :)

It was called AwayGrabber (www.awaygrabber.com).

I wrote a overly complex crawler in C to grab away messages from IM networks as fast as rate limiting would allow. Then created a web frontend for viewing all of the status messages from your friends.

It was cool since in most clients at the time you needed to click on a friend and select "get info" for each status you wanted to read. Feeds for status make much more sense. However, I got tired of trying to reverse engineer the changes in various closed protocols (oscar, etc). So I did more than ponder this when I was in college, I tried it.


I really thought Microsoft was ahead on this when I noticed their Live.com spider hitting my site 3:1 for Google. That was 1 year ago. What took Google so long to realize real-time is where its at?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: