Google: Bing Is Cheating, Copying Our Search Results

raganwald · on Feb 1, 2011

Hmmm. Let's say that Bing sets up a script that sends queries to Google and then records the results. That's clearly copying. But what Bing does is when you use its toolbar, it watches what you do and uses that information to rank results. Is that really copying? It showed Google's Honeypot page because Google's engineers were clicking on the Honeypot page with the toolbar installed. That isn't copying Google's results, that's copying the actions of Bing toolbar users.

This can easily be demonstrated. Google can set up a second honeypot but instruct its engineers not to click on the link, ever. If it shows up in Bing's results, then Bing is watching what Google returns and scraping its results.

But if the second Honeypot doesn't show up in Bing's results, then clearly Bing isn't copying Google's results, it's copying its toolbar's preference for links.

The entire thing is moot to me. The takeaway in't whether Bing copies Google. The takeaway is that Bing's toolbar is spyware :-)

GavinB · on Feb 1, 2011

Let's say that Bing sets up a script that sends queries to Google and then records the results. That's clearly copying.

I'd question even that hyperbolic interpretation. Let's say that Google sets up a script that sends queries to websites and then record the results and incorporates what links are shown on that site into their search rankings. Is that clearly copying? No, that's just pagerank.

If you have a web directory, a link page, a blogroll--isn't Google "copying" your work by using it to improve its search results? How is that any different from what Bing's doing?

gphil · on Feb 1, 2011

> No, that's just pagerank.

This is my first thought as well. Google's pagerank analyzes the link structure of the web as one of the inputs to its search ranking. Apparently, Bing's toolbar analyzes page content coupled with user click behavior as one of the inputs to its search ranking.

These two things don't seem very different to me. Both of them are relying heavily on the value provided to them by tracking and analyzing the behavior of users on the web to drive search results.

tungwaiyip · on Feb 2, 2011

I have the same thought. It is more a matter of framing. A while ago some people accuse Google of unethically profiting because they are farming the link structure of the Internet, which is the labor of many people (is it Nicholas Carr?) I don't really buy this framing. But Google's accusation seems fall on a similar line of argument. You can also setup a "Google sting" to prove they are copying from the Internet. It is called "Google Bombing".

Bing will be at fault if they specifically target Google. But if you consider entering a keyword and then click a link is essentially targeting Google search, then it only expose another problem, that is Google's monopoly on the search market.

raganwald · on Feb 1, 2011

We agree that it is copying, the subject open for interpretation is whether it's "wrong" in some way. Information is all about copying, that's the whole point :-)

krosaen · on Feb 1, 2011

but the link between the obscure query and the click on the page wouldn't have been made without bing knowing the user first searched for the query on google, no? if it were simply boosting page clicks that would be one thing, but how else could rim.com rank 1st for "mbzrxpgjys"?

brudgers · on Feb 1, 2011

What the experiment shows is that where no other data is available Bing will use what it has and that Google can successfully seed Bing on the long tail. What it doesn't show is that in typical circumstances Bing is relying on data gathered from Google searches.

Microsoft is collecting the same sort of information on Google queries that it collects on Bing queries and that Google collects on Google queries. All this is happening at the long tail where both companies are most likely using something other than webcrawling to tailor search results - afterall the whole experiment is only possible because Google can seed page rankings at will to link arbitrary terms to specific search results.

seanalltogether · on Feb 1, 2011

Well yes there is a hard link happening between a google search result and a link being clicked, however googles argument isn't as strong if it turns out bing is doing this for all search engines. It might be that they aren't targeting google specifically, but instead they're targeting all search sites generically.

danpker · on Feb 1, 2011

But then still, the Bing toolbar is watching what you're searching for and recording that information. That's a pretty big privacy issue

patio11 · on Feb 1, 2011

Google doesn't really want to get into a heated discussion about the evils of a search engine knowing everything you've ever searched for. Stones, glass houses, etc.

(Given Google's near-monopoly of the market, Microsoft and DDG have some amusing competitive synergy going on, don't they. DDG can criticize Google all they please for retaining user data because DDG doesn't and isn't in a position to benefit from it. Microsoft, which certainly is in a position to benefit from it, doesn't need to worry about Google calling them on it because Google is the only search engine that can actually lose market share over the issue.)

bertil · on Feb 1, 2011

For all Google’s sins, there is a Dashboard that let you erase all you care them not to know: Google could promote that heavily in case of stone, glass house. I always assumed this would be a great way to learn more about queries: spotting what words people are ashamed to have searched for.

It could be something a registered user could set from a browser toggle, and DuckDuckGo is a very good project, or course. My point was: data portability and user control are within Google's long term interest, not being evasive about their data cache.

seabee · on Feb 1, 2011

While it's great that Google lets you delete information, harvest-your-data-by-default is not a choice made with your best interests at heart.

taylorbuley · on Feb 1, 2011

This is what all toolbars do, and is largely the point of why big companies offer them and pay little software companies to make them optional installs (see Corel's WinZip, which installs the Google toolbar)

jamesjyu · on Feb 1, 2011

And that is explicitly stated in their tos. There's no hiding here.

bertil · on Feb 1, 2011

Let’s agree “not hiding” is what is in the demo video. TOS… we all know they could add that you sell the soul and the virginity of your mother in there, no one would read it.

barista · on Feb 1, 2011

It only records that when the user has explicitly agreed to send anonymous data. chrome and Google toolbar do that too and so do most of the toolbars out there.

T-R · on Feb 1, 2011

I think this would still leave Google with a fairly strong argument - if Bing does it for all search engines, then they're effectively copying whoever is most popular. Since it's done through Internet Explorer, which is still bundled with Windows in most places, they could try to make the argument that Microsoft is using their position in the OS market to crush competition in other markets.

bertil · on Feb 1, 2011

Interesting angle to go through the tied market and competition policy: that's a type of authority that is far more intelligent, and precisely just prosecuted IE in Windows. However, you'd have to either have a US court acknowledge that a European was right to disagree with them in the first place, or have a European court admit that their previous decision wasn’t enough. It’s feasible, but hard.

Where you’ll be more limited with it, is that it’s apparently not IE, but the Bing Bar that is at stake—the connection is getting thinner.

DenisM · on Feb 1, 2011

If a page contains a unique word, and people who were on that page universally go to a different page, that could be enough evidence for bing to assume there's a link between the unique word and the target page.

amalcon · on Feb 1, 2011

There are browser toolbars that aren't spyware? I'd think anyone likely to actually hear about this would have already been assuming that all browser toolbars are spyware.

stcredzero · on Feb 1, 2011

It might be a good exercise to chase down old posts of people who wrote the first browser toolbars, as well as the browser infrastructure that made them possible. We can contrast the speculation on why they might have been a good idea with the actual result. Not as a way to trash them, but as an exercise in how smart people can miss the mark.

jodrellblank · on Feb 2, 2011

More popular if done as a way to trash them, I expect. You could make "trash old usenet posts".com and have a ranking system for "worst decisions ever".

(C.A.R Hoare's billion dollar mistake, for example).

ig1 · on Feb 1, 2011

As it happens there's an exact historical precedent for this. Post code <-> long/lat data is copyrighted in the UK, but users were using Google Maps (and others) to do conversion and supply them to open source databases, the end result is that Google had to change their licensing/API to restrict this sort of behaviour.

Just because you're copying the data indirectly through a third party doesn't mean you're not breaching the copyright.

raganwald · on Feb 1, 2011

Interesting case to raise. Reminds me of the NFL terms voiced after every game about the broadcast being for private use only. I imagine that if their terms don't already include a clause like this, they can try to suggest that a toolbar tracking user clicks is violating the terms.

Very murky waters. If Google starts complaining that other people are tracking their users, they might end up educating users about how much they and their advertisers track.

...I hope DuckDuckGo figures out a way to capitalize on this brouhaha...

notahacker · on Feb 1, 2011

UK law seems to be unusually stringent on this though (e.g. FootballDataCo claims copyright on facts such as dates of football fixtures)

tintin · on Feb 1, 2011

"The takeaway in't whether Bing copies Google. The takeaway is that Bing's toolbar is spyware" And also that you can control search results in Bing. Nice feature for advertisers.

bertil · on Feb 1, 2011

You might need to hold in your hands the ethics of the webspam team at a major search engine, though.

Jamiecon · on Feb 2, 2011

Only for terms that are worthless.

cjoh · on Feb 1, 2011

For all the money, resources and engineering talent that Microsoft has, you'd think they wouldn't need to do this, though. That's what's baffling to me.

Call Google the market leader all you want, but let's not forget that Microsoft's market cap is around 40 billion dollars greater than Google's.

That's more than Research in Motion's total value!

poacher69 · on Feb 2, 2011

how would you expect a search engine to be able to surface a web-page given a specific query in which there is no data to create the relationship except the signal from google engineers spamming bing with click data.

these are cases of outliers. they don't exist on the real internet, or at least where pages exist without any other data (anchor text, inlinks, outlinks, words from the query in the document) they never get surfaced from a search engine.

abscent fake click-data there is no way google could surface these documents for the specific queries. in fact google states this openly in their "attack piece". before they manually changed the rank of these document they didn't surface these either.

the only evidence of "cheating" is that bing surfaces document for which there is no known relationship between the query and the document, except for spam created by google engineers. this is evidence only of a bug in bings ranking algorithm. clearly it is using signals from google. just like google uses signal from CNN (keywords, inlinks, outlinks, anchor text, etc).

i'm sure bing is thankful to google for helping find this defect in their system and are hard at work to fix it.

people talk about bing copying search results like google invented search results and put a lot of hard work into them. in this case the only hard work they put in was designed to spam bing.

i can only conclude that google is getting worried about bing quality and has run out of ideas on how to fix their own problems.

user24 · on Feb 1, 2011

all search engines make use of a variety of signals. Bing decided to use what users click on as a signal. Google spotted it and thought it was 'zomg bing are stealing our results'. I don't understand why you think taking advantage of a new signal to improve search is not a smart move by Bing?

bluelu · on Feb 1, 2011

The googlers are just angry that they couldn't come up with that genious idea themselves.

redthrowaway · on Feb 1, 2011

I don't see this as being any different from what Microsoft has been doing for 20+ years. They let a competitor put the work into figuring something out, then make a a reasonably accurate facsimile thereof. I think it's lazy, but not particularly unethical. If Google were Benz, would they be complaining that Ford was making 4-wheeled vehicles with an engine? More appropriately, and given that I've been on a Top Gear bender lately, if Google were Cadillac would they be complaining that everyone else was copying their method for operating vehicles, with three pedals, a gear shift, a steering wheel and a handbrake?

I get why Google is upset, but this doesn't strike me as unethical behaviour in a free market.

calebjenkins · on Feb 1, 2011

Does anyone think that the Google toolbar doesn't do the exact same thing? Just sayin

newman314 · on Feb 1, 2011

Here's an easy test. Test with the Bing toolbar installed but with Bing and some other search engine (blekko, whatever).

This should help establish if it's the toolbar that is sniffing.

If so, while it may be questionable behavior, Bing would not be copying Google's results.

Matt_Cutts · on Feb 1, 2011

I had a front row seat for this test. I believe the experiment we ran provides conclusive proof. I'm on a panel with a representative from Bing later today and I'll ask Bing about this directly.

encoderer · on Feb 1, 2011

It's embarrassing for Google to complain about this. You FINALLY get a little competition on your turf and you try to make some big issue that, as a market leader, the product you produce is being watched, analyzed and in some ways incorporated by your competitor.

There is no victim here. They are not taking your 1st result and copying it. They are taking the result the user clicked. Obviously you didn't predict that with your algorithm or you'd have always made that the 1st result. Instead, what they're tracking is user behavior, not your raw ranking.

Obviously users give Google implicit permission to track their behavior by using your product. And similarly, by installing the Bing toolbar, they're giving Bing that permission.

This is beneath you Matt and it's beneath Google.

DenisM · on Feb 1, 2011

In short, Bing Toolbar infers relationship between words on the page and the next page the user clicks on. Google's team purposefully confined Bing Toolbar behavior-tracking algorithm to their use of google.com search result page, and then cried fool about "bing stealing Google's search results".

This is disgraceful attention-whoring on Google's part. Quite surprising, too, as I don't remember them ever stooping that low.

dunham · on Feb 1, 2011

They may just be using terms in the referring URL rather than the referring page. It'd probably yield better data. Most site-specific search pages would have the search term in the URL.

I wouldn't be surprised if they're more interested in data from domain-specific sites like epicurious than generic search sites like Google.

I'd also guess that this won't work once the SEO guys figure out they can feed fake clickstreams to MS.

DenisM · on Feb 1, 2011

Referring URL? Oh right, Google intercepts all search result clicks through a redirector, and includes the search term in the intercepting URL. Yeah, the unique words in that URL could figure into describing the content of the final destination page (after redirect), the same way anchor text figures as well.

tommi · on Feb 1, 2011

Just because you are big, doesn't prohibit you from exposing issues.

We really don't know how much value Bing puts on clicks made on Google. Perhaps a lot?

What's the Google's official stand on this?

forgotusername · on Feb 1, 2011

Even Matt's allegation is softened by "I believe" here: there appears to be nothing that conclusively indicates Bing is solely targeting Google. For example, the observed behaviour could be a side-effect of a generic algorithm to extract and associate search queries with a user's click stream, which is only a minor variant of what Google itself does with its own toolbar.

If the case described above were true, then all Google has done here is to make inconclusive accusations and use the occasion to highlight its own dominance over search.

It seems to me this is just a cheap and slightly seedy PR stunt.

ig1 · on Feb 1, 2011

Associating search queries and click stream behaviour is fine if it's your own search engine. Doing it for someone else's search engine isn't (regardless of who's doing it).

forgotusername · on Feb 1, 2011

In the hypothetical situation above, it has almost nothing to do with the search engine - it is both the user providing the query and selecting the result: this is the data of value, not which intermediary provided the list of results to select from.

visural · on Feb 2, 2011

Really? Doesn't the intermediary that narrowed the list from several billion possible matches to the best 10, including correcting the inherent spelling errors in the query carry some value?

chollida1 · on Feb 1, 2011

> Associating search queries and click stream behaviour is fine if it's your own search engine. Doing it for someone else's search engine isn't (regardless of who's doing it).

Why? This seems like a great idea.

axod · on Feb 1, 2011

I don't think it's embarrassing to point out that Microsoft is playing dirty.

They played dirty with Netscape/IE in the 90s and look what happened.

TheCondor · on Feb 1, 2011

They have a very long history of playing dirty. Lotus learned it, MS took information Lotus shared with them and then shared it with Excel and Office and they supposedly kept Lotus on an API changing treadmill. Digital Research learned it, MS wrote code that made Windows 3.x crash if it detected DR-DOS. Netscape learned it. Arguably, IBM(OS/2) and any other operating system vendor learned it in the 1990s as well, MS charged premiums if hardware vendors wanted to install non-Windows operating systems. They sort of tried to do it to Intuit, they made a competitor and then effectively gave it away for free. Enough so that a lot of folks avoid Mono like it's, well, actually mono. They've established that reputation, and most of the time, by the time it became clear what was going on, MS had already done irreparable damage.

It is kind of embarrassing for Google, but if it is real and it continues, it's better to address it now rather than after MS becomes a titan of search and Google's market has eroded. At times, it seems like MS has changed in ways, but fundamentally they're still run by the same guys. Remember that when you play your Xbox or use Bing or any MS products, they don't like to see other successful software companies.

axod · on Feb 1, 2011

Also there's the strange fact that Bing and their whole online division makes gigantic losses. They're not in it for the money, they're in it to stifle competition and hold back progress so they can milk their cashcow some more.

bertil · on Feb 1, 2011

I think they are complaining because it could be far more widespread: it would actually be easier for the head than the long tail. Where it‘s harder is for News, and Bing appears to lag for recent results.

I remember that when Bing went out, everyone was wondering how close to Google the results were (and talked about it as a good thing).

haberman · on Feb 1, 2011

If it's beneath Google to complain about Microsoft riding on its coattails for the highly valuable "long tail" of queries, surely it's beneath Microsoft to sue Android manufacturers for competing in the smartphone space?

http://www.ft.com/cms/s/2/8b1ecaa2-cdb2-11df-9c82-00144feab4...

Would it have been better if Google had jumped straight to the questionable lawsuit part, like every other company seems to do when threatened on its own turf?

alain94040 · on Feb 1, 2011

Let me try to understand what happened. For some obscure searches that return no results normally, a handful of users searched on Google using IE's toolbar and then clicked on bogus results.

Those bogus results made it to Bing's results eventually.

Ok... So it proves Microsoft analyzes the toolbar behavior and when it has no other data, it will therefore look like a copy of Google search.

Sounds fair to me. Do you want to get into a discussion on how exactly Google tracks you online?

contextfree · on Feb 1, 2011

"Those bogus results made it to Bing's results eventually."

... in 7-9% of cases.

poacher69 · on Feb 2, 2011

in 7-9% of the times google tried to spam bing. not in 7-9% of search results. this indicates that maybe google isn't that great at figuring out how to spam bing or that bing is pretty good at defending agains spam. maybe google could take some lessons from bing on cleaning up spam and problem that seems all to prevelant on google these days.

Groxx · on Feb 1, 2011

There is, however, the legitimate complaint that they apparently do not finally have a little competition on their turf. Their competition is cheating, not innovating. That helps who, how? At best for "competition" sakes, Bing nabs a big share of the market; now there are two big dogs who make it hard to enter into the search realm with new ideas.

TimothyBurgess · on Feb 1, 2011

As we all know, this isn't the first time Microsoft has copied someone else. And I'm sure it won't be the last.

I think Google has a right to complain. Microsoft has resorted to these less than innovative tactics to monopolize themselves for a long time now, and it isn't fair to companies like Google who have worked their butts off (and gave 1.8 million shares - $336M in 2005 - to Stanford for the PageRank algorithm) to develop their superior product.

bkhl · on Feb 1, 2011

You clearly didn't read the whole article. If you did, you clearly didn't understand the article. Step back a little bit, fanboy. Microsoft wasn't copying anyone here.

TimothyBurgess · on Feb 1, 2011

Oh I definitely did read it in its entirety and understood it perfectly. What you're failing to do is see the whole picture.

Let's put it this way... if Google hadn't bought the PageRank algorithm from Stanford and put years of work into perfecting their search results, Microsoft wouldn't have any way to track which Google search results users click. It's an unfair tactic that clearly demonstrates Microsoft's sketchiness and desire to monopolize themselves (by any means necessary, "evil" or not) wherever there's a computer.

As for the fanboy comment... I'm certainly not a fanboy but I'll let the following speak for itself: Microsoft Internet Explorer vs Google Chrome

bkhl · on Feb 2, 2011

All your comments are coming from your assumption that Microsoft is trying to monopolize in something - in this case, search. Hence, your comments (although you will disagree) are biased and irrational. Microsoft isn't trying to monopolize in anything nowadays. In fact, they can't, so they aren't even trying.

From a search engine user's point of view, I believe this whole fiasco is ridiculous. First, it's ridiculous because Google is handling this situation very immaturely. Matt Cutts should not have confronted the VP of Bing in a way he did. Second, if I were the user of the Bing Toolbar, I gave permission to the Bing Toolbar to use my behaviors to polish my search results. I have no problem with that. Lastly, the experiments they did has more to do with "guessing what user wanted" than "what PageRank does".

I've used Bing fairly often past 6 months because of too many spams Google search results were giving back. Now that Google has fixed (or still working on) the spam problem, I'm starting to use Google again. However, what I noticed from the past 6 months is that Google search isn't so much better than Bing. This Bing Toolbar fiasco only applies to synthetic queries that I would never make.

Is Bing cheating? I don't think so. To me, they are just using another signal from user's permission. However, the definition of cheating will be different for everyone else.

alttab · on Feb 1, 2011

I disagree. From some point of view somewhere this is "standing on the shoulders of giants."

Perspective changes things here, which means no one is "right" or "wrong".

recoiledsnake · on Feb 1, 2011

Err, by that line of thinking, Google leveraged Linux (the hard work of volunteers) to earn tens of billions and does not release the modified code for use of the volunteers. Of course they are not required to, but it isn't fair to Linux developers who have worked their butts off to develop Linux.

xuki · on Feb 1, 2011

I strongly disagree, it's not the same analogy. Linux developers explicit say you can use the code for free.

It's more like Linus say you can't use the code, but Google use them anyway.

Btw, Google contributed a lot to open source projects.

recoiledsnake · on Feb 1, 2011

>It's more like Linus say you can't use the code, but Google use them anyway.

By installing the Bing Toolbar, users are giving permission to track their clicks. If Bing's server farm is searching Google and parsing the results then it is more like your example.

krakensden · on Feb 1, 2011

They employ Andrew Morton and Ted T'so, and have been working pretty hard to get the delta between their custom Linux and upstream down- I'd say they've been pretty fair to Linux developers.

user24 · on Feb 1, 2011

I have a suspicion what you'll find is that Bing use the toolbar to match $current_page_content with $clicked_page_content. When $current_page_content contains obscure words, that becomes the only signal, and so bing's engine will naturally associate it with $clicked_page.

In other words, there's a relationship between Page A and Page B if there exists a link beween them (==PageRank). But the strength of the relationship is increased based on how many users click on that link. I think that's the information Bing were trying to capture (or if they weren't, they should have been).

What I'm saying is that it's probably an unintentional side-effect. At scale though, the effect is that Bing gradually uses Google as a signal, simply because Google is a popular site.

edit: Yet another way of saying it: I think it's not just clicks on Google searches that are captured by Bing, but clicks anywhere. Google is a large site, so its influence on Bing can be measured. This is what we're seeing. My theory. I don't work in search.

firebones · on Feb 1, 2011

Exactly. If they are just matching (even more simply) $search_term_entered to $clicked_link then you would expect that they are "copying" from any search engine configured in the toolbar.

Now the interesting thing to reverse engineer is what other information might be passed along to give relevance to the search term/click pair. If Google could establish that there was a third piece of info in the tuple, such as "originating search domain" and that Bing used this to weight term/click pairs based on the authority of the source, Google's claims would hold more water. I suspect that Bing has to apply some kind of validation of the term/click pairs (for instance, only sending pairs that appear on the same results page from accredited engines), otherwise they would be subject to "Bing bomb" attacks where users or botnets vote up lower ranked (or even unranked) clicks for a given term. (And if they don't validate or detect gaming, then there would be ample opportunity to inject all kinds of synthetic behavior into Bing's search results. Based on the relatively few number of users and clicks it took to own a long tail term, it seems like the protection they have is very weak or simple.)

mvandemar · on Feb 1, 2011

This makes a lot of sense, and would have be easy enough for Google to test as well, creating some tiny, brand new, never before heard of test search engine that Bing would have no reason to copy, see if the same thing happened.

user24 · on Feb 1, 2011

That would be a nice way of testing it.

edit: I'm not even sure if it's only search engines that are being analysed by Bing or all pages, but it's possible that it is just SEs - they could be capturing query terms distinctly.

mvandemar · on Feb 1, 2011

They also probably should have tested to see if it happened with results other than the ones in the #1 spot.

recoiledsnake · on Feb 1, 2011

Also, why did the experiment succeed for only 6 or 7 of the 100 terms that they tried? There's more than what meets the eye here, regardless of the hype and everyone jumping on the bandwagon.

raganwald · on Feb 1, 2011

The article suggests that the Bing toolbar monitors what its user click and uses that information to improve Bing search results. Is that what you have conclusively proved?

I'm interested in another experiment. If you set up a honeypot, search for the term, but never click on the link, does the honeypot start showing up in Bing? The article doesn't say whether you tried this. Did you try it? Are Bing scraping your results from the page or only tracking their users clicks?

Matt_Cutts · on Feb 1, 2011

Anyone can test that Microsoft's software sends the clicks back to Microsoft, although I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.

Google's search results are blocked in robots.txt, so I don't believe Bing has been able to crawl our search results directly. All the evidence points to users' clicks on Google, which are then sent to Microsoft.

Microsoft has (so far) declined to admit whether our allegation is true. Getting them to talk about exactly what they do and what software they use or don't use would be the easiest way. I'd like them to confirm or deny, which is why I wanted to go to this search panel later today and ask them.

kgrin · on Feb 1, 2011

> so I don't believe Bing has been able to crawl our search results directly

Isn't compliance with robots.txt more of a voluntary thing?

I'm not accusing MS of ignoring it when convenient, but if you/we/someone is accusing them of acting unethically wrt search results in the first place, telling the crawler to ignore robots.txt wouldn't be that far away, would it? (And likewise faking the user-agent, etc.)

For better or for worse, UA identification, robots.txt compliance - all those things are voluntary. I'm not suggesting they shouldn't be, but it certainly makes a difference in terms of whether something's possible or not. (And, if you ask me, places an even higher obligation on the actors to behave ethically, lest trust completely evaporates and the whole thing goes to hell in a handbasket).

jasonwatkinspdx · on Feb 1, 2011

I am not a lawyer, but as I understand it there is some precedent in the US of intentionally ignoring robots.txt being unauthorized computer access, exposing you to all the liability that entails (possibly criminal).

c2 · on Feb 1, 2011

I'd like to see an actual case reference for this. I've never heard of ignoring robots.txt resulting in any kind of legal action.

It would take a pretty big leap to go from robots.txt is advisory to ignoring it constitutes a criminal action.

jasonwatkinspdx · on Feb 1, 2011

Internet Archive was sued unsuccessfully. As I understand it a lawsuit is still in process against Google on the topic. So I guess the precedent is weaker than I thought, but still: tread carefully.

brudgers · on Feb 1, 2011

Matt, don't Google Toolbar and the Chrome Browser similarly send information to Google for use in improving their services?

stanleydrew · on Feb 1, 2011

If you read the article and other comments here it's been made perfectly clear that the Google toolbar and Chrome browser are not sending similar data back to Google.

271828183 · on Feb 1, 2011

Ah, at least the google toolbar does. If you enable PageRank on the Google Toolbar it sends back all the urls you visit just like the bing toolbar.

From the toolbar privacy policy: "Toolbar's enhanced features, such as PageRank and Sidewiki, operate by sending Google the addresses and other information about sites at the time you visit them."

Google has managed to demonstrate one way MS appears to be using the data. What does google do with their trove of data? That's a lot of data to collect and not do anything with.

If they want to make it perfectly clear they should add into their privacy policies and EULAs.

stanleydrew · on Feb 1, 2011

Yes absolutely. I don't think anyone in this thread or in the article denied that the Google Toolbar sends data to Google. And you are absolutely right that Google's use of the data collected should be clearly stated in a privacy policy and EULA. It might be, I haven't read them.

But the article clearly covers the available public statements on this issue and patio11 dug up a post from Matt Cutts in his comment below that directly addresses this: http://www.mattcutts.com/blog/toolbar-indexing-debunk-post/.

brudgers · on Feb 1, 2011

I did not say "similar data" because "similar" is a bit too slippery a word in a technical context. There's too much plausible deniablity. What I am asking is if Google's tools send data back to Googleplex to be mined for the sake of search engine improvements.

stanleydrew · on Feb 1, 2011

Then what use is the word "similarly" in your comment? Similarly send? As in via HTTP requests? I think that's either obvious or irrelevant or both.

Again, if you actually read the article, you will come across the section titled "What About The Google Toolbar & Chrome?" I encourage you to read it.

[edit] Also, see this comment and patio11's subcomment further down the page, both of which were written an hour before yours: http://news.ycombinator.com/item?id=2165469#score_2165578.

pbz · on Feb 1, 2011

Quote from the article: "In fact, Google stressed that the only information that flows back at all from Chrome is what people are searching for from within the browser, if they are using Google as their search engine."

I'm pretty positive that's not true. If you run Fiddler when browsing with Chrome you will see constant hits to toolbarqueries.clients.google.com whether you're using Google or not. I could be browsing some MS site and toolbarqueries.clients.google.com gets hit. Chromium doesn't do this.

Edit: You can uncheck everything under privacy and it will still send those requests.

Edit2: What it sends back looks something like this:

<?xml version="1.0" encoding="UTF-8"?><autofillquery clientversion="6.1.1715.1442/en (GGLL)"><form signature="8551191143090325242"><field signature="620769395"/><field signature="2995202485"/><field signature="2175865763"/><field signature="904516291"/><field signature="2953051246"/><field signature="2649047790"/><field signature="2308153337"/><field signature="1003471793"/><field signature="3255484099"/><field signature="1305698505"/><field signature="3676143819"/><field signature="1275502930"/></form></autofillquery>

Looks like auto-fill data, but this happens when I click around a site, NOT when searching Google or typing something in the address bar. For some sites (interestingly, not all) it sends 3 requests for each page load.

stanleydrew · on Feb 1, 2011

That's troubling. I'd be very interested in seeing a response from Google about this. Are you aware of any? Also, can you use Fiddler to inspect the content of the requests? I'm not familiar with the tool.

orrc · on Feb 8, 2011

I see this too, if I have autofill enabled, and at least one autofill address entry.

I would guess that Chrome is sending a hash of the <form> (perhaps URL + method?), plus a hash of each of the <input> tags, and Google returns some sort of information about what kind of form it is?

If so, it would mean it's pretty easy for Google to determine which sites you're on from the pattern of hashes sent for each site. e.g. I see this data sent in the clear for pretty much every page on https://www.facebook.com/

neild · on Feb 1, 2011

Is this malicious site detection by any chance, or does that use a different mechanism?

pbz · on Feb 1, 2011

Only thing I could find is this: http://code.google.com/p/chromium/issues/detail?id=57527

and this: http://code.google.com/p/chromium/issues/detail?id=60422

DenisM · on Feb 1, 2011

>I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.

Please. Adding my own SSL cert to my own laptop is not harder than I'd expect. Certainly not harder than many other things you did in setting up this experiment.

poacher69 · on Feb 1, 2011

are you claiming that google never scrapes bing search results pages? or any other search result pages?

Matt_Cutts · on Feb 1, 2011

poacher69, we crawl the public web. Anyone that blocks us out with robots.txt, we won't crawl. If you check bing.com/robots.txt, it has "Disallow: /search" . So no, we won't crawl Bing's search results pages. If anything, users tend to complain when search results from Lycos or wherever show up in Google.

underdown · on Feb 1, 2011

http://www.bing.com/robots.txt User-agent: * Disallow: /search

Funny thing: http://www.google.com/search?q=site%3Abing.com%2Fsearch%2F

I was gonna call out Matt for crawling bing's search results but I'm guessing Microsoft hasn't realized they return results from the /Search/ folder. ;)

nitrogen · on Feb 1, 2011

Once again Microsoft is bitten by expecting case insensitivity.

poacher69 · on Feb 2, 2011

matt, how does google do competitive relevance evaluations without scraping Bing?

illdave · on Feb 1, 2011

From my experience, Googlebot doesn't crawl pages that are blocked in robots.txt files. Check out Bing's robots.txt: http://bing.com/robots.txt - notice how /search is disallowed. That typically means that Googlebot isn't able to access that page. The same for the other search engines, it's more down to if they specify (through robots.txt) that Googlebot isn't allowed to crawl those results.

brudgers · on Feb 1, 2011

Given that Google appears to have an active program to monitor the results of search queries on Bing and to track Bing's page rankings and the ways in which they change over time in (How else is this more likely to have come to their attention?), they should hardly be shocked, shocked to find Microsoft doing something similar.

[edit] I have always suspected that the real value of Bing for Microsoft is to prevent Google's data mining of queries originating in Redmond.

user24 · on Feb 1, 2011

... and feeding that data back into their own search results, rather than just using it for analysis to see how your competitors are performing?

If I'm in the business of giving horse racing tips and I read your tips to see what your strike rate is compared to mine, that's one thing. If I start tipping the same horses as you, purely because you tipped them, that's quite another thing.

brudgers · on Feb 1, 2011

Microsoft isn't using strike rate data in the important sense which you imply - the strike rate for Google is advertising revenue. It's not as if Microsoft is collecting info on the advertisements displayed and then soliciting those advertisers to spend their dollars on Bing (at least that's not part of the allegations). I am pretty confident that Google feeds every bit of legally collected relevant data back into their search algorithms.

user24 · on Feb 1, 2011

no, the "strike rate" is precision/recall. Good advertising CTRs is a side-effect of relevancy. Relevancy is measured by precision and recall.

edit to expand: If the measure by which a search engine evaluated itself was advertising revenues, they'd all have massive intrusive adverts, and no users. The only viable measure can be the quality of the search results themselves. As a happy coincidence, if you build something capable of delivering high quality results, you can very easily use that to produce highly relevant adverts. Imagine that each advert is like a little webpage, and rank them just the same as you do for normal webpages. (caveat: there's no link graph for adverts, so we're reduced to using a simpler text mining approach, eg bag of words vector space la-di-da).

brudgers · on Feb 1, 2011

I believe that Google and Microsoft evaluate their search engines by entirely different measures. Google primarily by advertising revenue, Microsoft primarily measures by preventing searches using Google. Sure the ad money is nice for Microsoft, but they don't need it for their business to be profitable and they would be doing major research into search anyway because of its importance to businesses - they sell databases after all.

This whole episode points to the sort of counter-espionage operations the two companies are engaged in. Look how important a propaganda victory is for Google? It strains credulity to believe that the release of this information on the day of the panel discussion is pure coincidence.

user24 · on Feb 1, 2011

Microsoft, historically, have been kings of the desktop. With the rise of the web (which post-dated MS's rise), the desktop has become less and less relevant. Google is fast replacing them - my email, documents, search, advertising, analytics is all handled by Google. I don't use Microsoft for anything in my day-to-day life. Even on my main windows machine, my files are in my dropbox, outside of MS's control.

MS are desperate to regain control. Google will soon launch their own web-centric OS properly, and bam, MS will have no business apart from selling to an ever-dwindling number of companies who can't believe MS don't rule the roost any more. In 20 years they will simply cease to exist if they can't come up with a world-beating online product and win back control of people's computing lives.

Notice how they're diversifying into games and search in order to prepare for the worst case; that their core OS and 'boxed software' business fails.

SoftwareMaven · on Feb 1, 2011

Which may explain why Google is very successful and Microsoft is forced to buy its customers (and have huge losses in that division).

brudgers · on Feb 1, 2011

I think you may misunderstand my position. Bing is basically a research project for Microsoft. The ad revenues don't really matter, the data they collect does - it's learning v. earning. It makes sense for Microsoft to spend a billion dollars because better search algorithms have application for their B2B products and services. If they recover some of their R&D costs directly through advertising revenue, that's a windfall to the overall bottom line.

podperson · on Feb 1, 2011

Bing copying Google's search results is just evil. Not like copying Apple's iPhone design and user interface and giving it away to Apple's competitors, which is good. Right?

It's ugly and immoral and probably legal. Good job catching Microsoft at it (and I think the really really unethical and scary bit is that Microsoft is cheerfully stealing info from users via their browser). I also realize that Google got where it is in Search by innovation and iteration, and that Google's search team has nothing to do with Android per se, but you might see how Apple people feel about the business empires built on stealing their ideas.

nlogn · on Feb 1, 2011

I don't think Android devices being similar in some ways to the iPhone is even remotely analogous to what Bing is doing here. One is called healthy competition (and I don't see how Android is copying iOS). The other is literally just copying data.

othermaciej · on Feb 1, 2011

The original Android phone design looked like this: http://tctechcrunch.files.wordpress.com/androidphone.jpg

Then iPhone came out and it looked like this: http://km.support.apple.com/library/APPLE/APPLECARE_ALLGEOS/...

Now Android phones look like this: http://farm3.static.flickr.com/2795/4208849005_dd4b608729.jp...

You don't see any signs of copying here?

nlogn · on Feb 2, 2011

They are similar in that they both are big touchscreens with no physical keyboard. Apple did not invent that by any means. That is also a shot of the Android App Drawer and not an actual homescreen. Most Android homescreens I've seen have a few widgets on them and do not look anything like a big grid of app icons like the iPhone. Also, that "original" Android phone design you point out looks a hell of a lot like a Blackberry. Android must have blatantly copied RIM by your logic.

magicalist · on Feb 2, 2011

turtles all the way down, my friend: http://1.bp.blogspot.com/_YgpHa1oeHME/TTxZAZWLaxI/AAAAAAAAC_...

podperson · on Feb 2, 2011

Yeah, point to Palm -- a company built by ex-Apple people that produced a lower-cost Newton clone.

spot · on Feb 1, 2011

FYI Android started before iPhone was released. It's based on the Sidekick.

Vlasta · on Feb 1, 2011

I usually criticize Microsoft, but I am on their side in this case.

Why shouldn't Microsoft be using this kind of data? Google search result pages are part of the internet just like any other publicly available web site. Microsoft monitors what the users are clicking on Google and probably on Bing and other sites. So what? Monitoring users is not a new thing. It may be unethical and I may personally hate it, but almost everyone is doing it.

Google should stop whining about this and make their search result the best they can. If they had the best search engine, Bing could come close, but never overcome Google by just copying part of it.

kenjackson · on Feb 1, 2011

Great panel session. One of the best I've seen. I think you made your point, although I think MS did a good job neutralizing it too.

My question for you Matt... is there any way for Google to build a toolbar that effectively does what the Bing toolbar does (or even a joint one?). I jump to use the various search engines because no single search engine is sufficient. But clearly when Google isn't sufficient, you don't get the value of when I go to Bing. And vice-versa (as I don't use the Bing toolbar currently, but that may change now). Or do you feel that with 65% of the market, you don't need this info?

justlearning · on Feb 1, 2011

I can't argue about the legal vs illegal. I sure would like to tell the bing team that they should better google in their attempts. This strategy is mere copying into- 'look, I am as good as google'. Microsoft never learns from past do they??

kgermino · on Feb 1, 2011

Please keep us posted as best you can.

Matt_Cutts · on Feb 1, 2011

I believe the search panel with Google and Bing will be live-streaming, so anyone can tune in. It's by a group called Big Think.

illdave · on Feb 1, 2011

Clickable: http://bigthink.com/series/62

cool-RR · on Feb 1, 2011

I couldn't find the streaming video from there. Did anyone find it?

Splines · on Feb 1, 2011

According to the description it'll be live at 10AM PST (a little before this comment is 2 hours old).

illdave · on Feb 1, 2011

I guess they'll put the streaming video up when it starts (10am PST, so in just under 2 hours). That seemed like the most appropriate page to link to.

lda · on Feb 1, 2011

Looks like the site can't handle the traffic as of now (1.20pm ET).

bluelu · on Feb 1, 2011

You are just jealous that your engineers haven't thought of this yourself.

I bet is just recording general searches (input query) + clicked links. A pretty good idea.

And don't tell me you are not using the results from the google toolbar to rank the sites in google search.

sjs382 · on Feb 1, 2011

Is there a chance this panel is being recorded? I'd love to be a fly on that wall :)

sjs382 · on Feb 1, 2011

Oops, sorry. Found the link elsewhere in this thread!

http://news.ycombinator.com/item?id=2165861

KirinDave · on Feb 1, 2011

I can't help but feel a little surprised that Google found it ethical to lie on their search results page for any reason; and clearly this trojan horse page was a lie. Certainly at every search engine I've worked at we always said, "We can put up adds and help, but we can never outright lie." I suppose that's ameliorated by the fact that this was an internal experiment, but still..

The temptation to abuse that power is pretty big.

bbatsell · on Feb 1, 2011

I thought this was the most interesting part:

> The day after that, Bing contacted me. They were hosting an event on February 1 to talk about the state of search and wanted to make sure I had the date saved, in case I wanted to come up for it. I said I’d make it. I later learned that the event was being organized by Wadhwa, author of that TechCrunch article. [emphasis mine]

So the supposedly independent author of an article on TechCrunch that kicked off a massive wave of Google criticism is, less than a month later, organizing events specifically for a Google competitor? Boy, that sure seems above-board.

ig1 · on Feb 1, 2011

Matt Cutts from Google and Rich Skrenta from Blekko are also speaking at that event (not to mention Peter Thiel, Esther Dyson, and Malcolm Gladwell). It's an industry event that happens to be sponsored by Bing, it's not really an MS event though.

simonk · on Feb 1, 2011

Still, a TechCrunch & BusinessWeek contributor that gets paid by Bing is interesting.

cliq · on Feb 1, 2011

and techcrunch has long been hosting google ads. so.... what?

KirinDave · on Feb 1, 2011

Uhh... Yeah? Everyone in search does this. I've worked at and with 3 major search engine initiatives, and we all tested heavily against Google in a variety of ways.

But the article definitely gets a few things wrong. For example, having worked at Bing I can tell you this: in general "obvious" misspellings are autocorrected without comment. It's not some sort of magical copying procedure, it's actually a policy. Want proof? Here's an example query you can repeat: http://fayr.am/4KdG (direct query link: http://fayr.am/4JZD)

But otherwise, shit yes everyone is scrutinizing google trying to figure out what they're doing. That doesn't mean other players aren't doing their own optimizations, or even running relevancy metrics against other search engines. Relevancy is not a concept with fixed metrics, and every player in the search market does everything they can to figure out what their competitor is doing.

And even the raw results leakage is fairly par for the course. It's not like Bing searches are a crawl of google searches; Microsoft gets this data from browsers running this toolbar and uses it to help shore up queries where they don't return good results.

sjs382 · on Feb 1, 2011

No surprise, as both DDG and Blekko disclose that they use Bing for long-tail queries, but it works at both of those engines, too:

  http://duckduckgo.com/?q=hiybbprqag
  http://duckduckgo.com/?q=mbzrxpgjys
  http://duckduckgo.com/?q=indoswiftjobinproduction

  http://blekko.com/ws/hiybbprqag
  http://blekko.com/ws/mbzrxpgjys
  http://blekko.com/ws/indoswiftjobinproduction

Matt_Cutts · on Feb 1, 2011

Ouch, you're totally right. So you're saying that because Blekko and DDG are powered partly by Bing, the rankings on all three engines are showing the results that Google inserted?

epi0Bauqu · on Feb 1, 2011

They shouldn't be in DDG. That's one of the pieces of intelligence we try to add on top. We try to not "correct" your query. Here's another example of a real search my wife did the other day that illustrates this point on both Google & Bing: http://duckduckgo.com/?q=suttle+renal vs. http://www.bing.com/search?q=suttle+renal vs. http://www.google.com/#&q=suttle+renal

sjs382 · on Feb 1, 2011

I love that you don't try to correct queries. I hate when Google auto-corrects my query and I need to click another link to get to my actual query. Thanks for that!

alecco · on Feb 1, 2011

But in this case DDG does give Google's auto-corrected (without even telling you!) and I bet it happens to many terms as it is mostly a proxy/filter of Bing.

epi0Bauqu · on Feb 1, 2011

No, actually they shouldn't be in there. Are you seeing them? If so it is a bug. It is not showing up on my computers.

callahad · on Feb 1, 2011

I'm still seeing it on tmbg, but thor and www seem fine:

http://tmbg.duckduckgo.com/?q=hiybbprqag

http://thor.duckduckgo.com/?q=hiybbprqag

http://duckduckgo.com/?q=hiybbprqag

Edit: The same appears to be true for mbzrxpgjys and indoswiftjobinproduction

Edit 2: Hey, that's weird. Adding a comma, semicolon, period, or other symbol to the beginning or end of the query makes the gamed results show up on top at thor and www as well. Seems to work for all the terms at issue:

http://duckduckgo.com/?q=hiybbprqag%2C

http://duckduckgo.com/?q=----hiybbprqag----

epi0Bauqu · on Feb 1, 2011

You're finding the edges of my intelligence layer :). Should be fixed now.

Note that some of those servers don't get updated often.

sjs382 · on Feb 1, 2011

Ah. They were loading the same results as Google and Bing when I posted that. They aren't now.

epi0Bauqu · on Feb 1, 2011

There were a number of bugs around the first result that this whole thing uncovered, so it could have been intermittent. Our backfill in the case of no results can vary. It shouldn't have been showing anything ever beyond the first result though.

sjs382 · on Feb 1, 2011

Thanks for the clarification. I use DDG as my primary search, and love reading about how it works.

alecco · on Feb 1, 2011

It was there using those links when I posted, but not anymore.

yaix · on Feb 1, 2011

Or you just "quote" your original misspelled query and Google will search for the misspelled version without any additional click.

epi0Bauqu · on Feb 1, 2011

It wasn't a misspelling though! That's the point. She is a cancer researcher and she meant that. Renal is for kidneys and Suttle is an author on various papers in the field.

sjs382 · on Feb 1, 2011

Quotes provide you with exact match results. Putting quotes around an entire phrase isn't what I intend sometimes, and quoting each term in the query is just excessive. In that case, I'll just click the extra link...

ZoFreX · on Feb 1, 2011

A timesaver that might be useful on the edge cases is to prefix terms with a plus sign. +the +quick +brown +dog is easier to type than "the" "quick" "brown" "dog", and should have the same results.

yaix · on Feb 1, 2011

You're totally right, nice find.

Just happy that my current SE favourite does not seem to copy and to actually have their very own results: http://entireweb.com/#q=hiybbprqag

sjs382 · on Feb 1, 2011

But the 3rd result (at the moment) is "Vanessa fox Nude"...

vanessafox · on Feb 10, 2011

Well, I'm not seeing that result now, likely because the results are now filled with actual news about hiybbprqag. But if Vanessa Fox Nude was ranking then that likely means that some signal was associating hibbpraqag with Google and that either they're using (at least in part) a really old index or the crawler they're using doesn't follow redirects very well.

vanessafoxnude.com has been redirecting to my current site for several years now, but back when the original site was active, much of the incoming anchor text was related to Google and search.

wccrawford · on Feb 1, 2011

It may not be illegal or 'cheating', but it's incredibly stupid for the same reason cheating is: Without the person you are cheating from, you can't pass the test!

In the case, the customers don't get relevant results unless other potential customers use the competition! In short, Bing's results are only good if Google is popular.

Why would you invest time relying on your competition? Shouldn't you be striving to match or beat them, rather than trying to piggy-back on them?

kenjackson · on Feb 1, 2011

Isn't this just the McDonald's v Subway/Burger King example? McDonald's has the research and foot traffic. Rather than do your own, watch where the successful McDonald's go and then put your restaurant across the street.

If you have a bunch of users searching for "XYZ" on a different search engine and consistently going to link A -- wouldn't that imply it was relevant? You'd do the exact same thing for searches on your own search engine. The only difference is people have opted in to allowing you to have this info _implicitly_ by going to your search engine vs giving you this permission _explicitly_ by clicking through the EULA for the toolbar.

tygorius · on Feb 1, 2011

Indeed. Drucker somewhere makes the point that a business's key activities are innovation and marketing -- which implies you don't necessarily have to be innovative in your marketing, just good at it. Hmmm... I was just thinking that rather than do my own exhaustive search for startup companies to invest in, I'll just check who has gotten support from Y Combinator and offer them a deal, piggybacking off of Paul Graham's work. Wait, it's been done? Oh, never mind.

As a practical matter, I doubt customers will care so long as they've always had the option of turning off that part of IE's behavior. I mean, when did you last care about the authenticity of your phone directory's information?

powrtoch · on Feb 1, 2011

That sounds all well and good, but in terms of cost vs. benefit, it's way easier to make a good application by stealing the years of hard work of the industry leaders than it is to reinvent everything and try to come up with your own clever tweaks to improve it. If you could do it without getting caught, it would practically be a no-brainer.

karanr · on Feb 1, 2011

Yes but it is one thing if a small startup does it, and its another if a giant like Microsoft does it. In a way its applying a double standard, but i think Microsoft has enough money to invest is innovating in search space. Ultimately you can't really prove its illegal, so its a matter of ethics. Strangely we would all encourage a startup to copy what it can, so it can focus on the core innovation and not be bogged down, but we look down on Microsoft. Why is that?

robryan · on Feb 1, 2011

Kind of indicative of where the internet is heading these days, a few big platform players which everyone else is piggybacking on. Like Windows Live Messenger attempting to stay relevant by combining facebook chat with it's own network.

DonnyV · on Feb 1, 2011

I think your missing the goal. The goal is to sell more advertising not to get better searches.

tptacek · on Feb 1, 2011

"Is it illegal? Is it cheating? Is it unfair?" Who cares? Google already got everything it needed out of this situation: a gigantic PR win, and a morale boost for their own team. Well played.

ghurlman · on Feb 1, 2011

I'm not sure how gigantic that PR win is, if I had to come to HN to hear about it.

tptacek · on Feb 1, 2011

Give it a day to percolate and then we'll see if I'm wrong. I obviously might be.

bostich · on Feb 1, 2011

just after i read this i see an articel on this on the frontpage of financial times germany

http://www.ftd.de/it-medien/medien-internet/:suchmaschinen-k...

headline: googles trap for microsoft

trustfundbaby · on Feb 1, 2011

give it time ... that headline sells itself. If Google pushes the issue, it should be on every night news report by tomorrow night.

ghurlman · on Feb 2, 2011

I'm not sure Google wants to get pulled into a conversation about automated tracking of user search data.

PR "wins" can become PR nightmares in a blink.

trustfundbaby · on Feb 2, 2011

Told ya!

georgemcbay · on Feb 1, 2011

Not a PR win for Google from my point of view. I'm a huge Google fanboy (daily user of the search engine, Gmail, Google Apps, Android, Google Voice, etc), but this whole situation is a PR stain on them for me.

Firstly because I think they originally misunderstood the manner in which Bing's results were being influenced by their own, and then secondly because if they are going to complain about Microsoft collecting information about their user's usage patterns -- well, that's really, really hypocritical coming from Google. Lastly because the whole thing smacks of high school level gossip. If Microsoft is really doing something out of line, handle it in some other way than engaging in a gossipy blog war.

To reiterate, I'm actually a Google fan, I'm OK with trading some privacy for useful services, but if they are going to bang on Microsoft for collecting user usage information, well that's about the worst case of the pot calling the kettle black I've ever heard of in the tech industry.

jobu · on Feb 1, 2011

I'm happy for Google. Now can they get back to improving their search results? The changes they made last week (http://news.ycombinator.com/item?id=2152286) did help some technical searches, but many results are still being overwhelmed by SEO spammer crap.

pinko · on Feb 1, 2011

Got any good examples? Looks much better to me.

jerf · on Feb 1, 2011

I agree in principle, disagree in detail; this is a loss for Bing. Google isn't going to get much further benefit out of this but Bing is going to have egg on its face.

Discussing the hypotheticals of the situation as others are doing is interesting (serious), but irrelevant. The court of public opinion isn't going to care about that nuance and will find against Bing if this goes viral. All of the other defenses won't matter either, "everyone is doing this" and so on. Public opinion won't care.

huherto · on Feb 1, 2011

I have to agree with you. After reading all comments in the thread it seems like the best bet for Google is to use this to make MS look bad. Using the cheating in school test analogy. If you get caught cheating, it is a blow to your reputation.

dansingerman · on Feb 1, 2011

My take is this: the whole Google ethos is that they are trying to have the best algorithm to give the best results. Outside of this sting they have always been at pains to put forward the view that nothing is manually ranked.

I think the same thing applies to Bing here: if they have a generic algorithm that ranks results based on toolbar (or other data) it could be easy to see how their data is skewed by Google given the amount of traffic Google search gets compared to the rest of the internets. This seems fine to me.

But if their algorithm does stuff with activity on google.com because it is google.com then this is a pretty clear foul - it is both essentially copying, and the equivalent of manually ranking results (specifically, Google results)

The corollary of this is that if their algorithm is generic, then it will still work if Google were to cease to exist. If it's not generic, it would be useless without Google.

gojomo · on Feb 1, 2011

When asked by SearchEngineLand, Google's Singhal seems to imply Google Toolbar clicktrail data is never used for ranking, but his wording is actually a bit vague:

Absolutely not. The PageRank feature sends back URLs, but we’ve never used those URLs or data to put any results on Google’s results page. We do not do that, and we will not do that.

Put any results could be read narrowly as "this data isn't used to add pages to the index", or more generally as "this data isn't used to rank results relative to each other". Also, Singhal's pledge that "we will not do that" is much stronger than any statement I've ever seen in any Google privacy policy, which all pretty much say Google may use any info they have to improve their services.

Matt Cutts, can you clarify if Singhal in fact meant the 'narrow' or 'general' interpretation above?

And, if the 'general' meaning, then is there any statement about the use of clicktrail data in Google's published privacy policies that is as strong as Singhal's?

zyb09 · on Feb 1, 2011

> It strongly suggests that Bing was copying Google’s results, by watching what some people do at Google via Internet Explorer.

Wow, it almost seems that is exactly what they are doing, which is some pretty dirty stuff. Now MS always had a shady track record, but I thought recently the company got a lot better.

tptacek · on Feb 1, 2011

You're crazy if you think every player in this space won't opt into "shady" if it means a better user experience. This is just a "shady" we get to hear about.

bad_user · on Feb 1, 2011

Like companies give a crap about "user experience" unless that means earning more and more money.

Piggybacking like this should really be copyright infringement or something, as there's nothing morally right about it.

On the other hand Google should be more quite about this, after all they've built their businesses database for Google Maps / Google Places by piggybacking third-party services like Yelp and TripAdvisor. And now all of a sudden when I'm searching for "restaurants" I have to scroll the page to get past Google's own crap.

samwise · on Feb 1, 2011

The real question should be why is Google not doing this. Bing seems to be learning from what results users choose and improving their results.

Seems like a no brainer, unless i missed something.

I also really like this for some reason. It's very ... gangster. Shows that bing is scrappy and willing to bend the rules.

That being said, i will still continue using Google.

samwise · on Feb 1, 2011

not sure why i'm being downvoted. if you don't agree leave a reply.

it's very discouraging

raganwald · on Feb 1, 2011

Your comment begun with +1. It is now at -1. So of all the people that looked at it, some modded it, and of all those people, there are two more people who downmodded it than upmodded it. Two.

Right now the post has +186. So assuming that half of those people read your comment, we have something like 93 people who read your comment, and there are just two more downmods than upmods.

Unfortunately, HN shows people a comment's score before they read it, which skews results. People aren't objective, they tend to upmod comments that are already upmodded and downmod questions that are already negative.

So if you are unlucky and the first one or two people to read your comment are constipated, you get a negative result for not fault of your own. Bad luck, try again.

Please don't be discouraged. Try to be helpful and constructive and to present a point of view that others may have missed in the conversation. The upmods will eventually follow. Cast your seeds and let a thousand flowers bloom.

earlyresort · on Feb 1, 2011

Sort of petulant on Google's part to release this, no?

Of course your competitors are going to copy you. It's not innovative, and you might consider it 'cheating' if you forget that each and every one of us are building off of a foundation laid by other people. But it works, and that's why it happens and will continue to happen.

random42 · on Feb 1, 2011

You are misunderstanding the word, 'Copy'. Here, it does not mean like "3-D movies is a great idea, lets copy it from avatar". Its more on the lines on "This code is great, lets copy it".

Copy == getting inspired by brilliance of an idea and implement on your own in the first case.

Copy == stealing in the latter case.

Copying (Ctrl C+ Ctrl V), google search results is theft, not "getting inspired" from previous body of works.

Edit - Made the PoV clearer.

earlyresort · on Feb 1, 2011

No one's stealing Google's code.

Here, it means 'we have evidence, given to us by our users who agreed to share their web traffic with us, that showing this result for this query is a great idea -- so let's do that.'

People obtain data on their competitors' performance all the time and tailor their products accordingly. It's not theft, it's competitive intelligence.

random42 · on Feb 1, 2011

> People obtain data on their competitors' performance all the time and tailor their products accordingly. It's not theft, it's competitive intelligence.

Yes, but in this case, its more like claiming competitors product (the search result), as your own in your product directly. I would consider it a theft.

encoderer · on Feb 1, 2011

No it's not. That's absurd. A clicked search result is a successful product. Bing is taking note of a competitors successful product and using that information in it's own decisions on the products (serps) it produces for it's users.

This is a bunch of microsoft haters making hay over nothing at all. Quit whining. It's not theft, it's not any more privacy-offensive than anything Google does, get over it.

random42 · on Feb 1, 2011

I agree with you 100% if that's the case, ( and matt Cutts ' comment suggests it is) . My argument were for the case if Bing were to crawl google results page, (programming equivalent of Searching on google and feeding results on own db as results for the term ), not considering user clicked information.

brudgers · on Feb 1, 2011

When identical inputs create identical outputs using different algorithms, it's reverse engineering not copying.

random42 · on Feb 1, 2011

Except the algorithms are not really different. one algorithm is just a wrapper over the second algorithm. the crux of the matter is really in just second algo.

brudgers · on Feb 1, 2011

It's not a wrapper. Bing isn't passing requests to the Google API and then returning the results on the Bing page. Keep in mind that 93/100 of the seeds Google injected into Bing's database were filtered out.

antimatter15 · on Feb 2, 2011

Or: "3-d movies are a great idea, lets take scenes from Avatar and put them in our movie" I don't think excessive simplification lends justice to the issue. Just like how piracy isn't necessarily theft, what Bing is doing isn't necessarily copying.

tzs · on Feb 1, 2011

Do I have this right?

1. User does a search in a Microsoft toolbar, using Google as his search engine. User is searching for $terms.

2. User gets a results page. User clicks on the entry in the results for $site.

3. Toolbar sends back to Microsoft that the $site was the first result the user chose for $terms.

4. Bing uses this to increase $site's placement in searches for $terms.

An interesting question then would be whether or not Microsoft also "copies" from Bing? That is, if you are using Bing as your search engine, do they still use the fact that you went to $site after searching for $terms to adjust the rankings?

duke_sam · on Feb 1, 2011

So in an effort to be as good as a competitor MS is watching what you do when you interact with that competitors website and sending that information home. Seems like a really big reason to suggest to anybody you know that they uninstall the Bing toolbar.

topcat31 · on Feb 1, 2011

Google gathers lots of user data on 3rd party websites via services such as (to name a few): - Google analytics (opted in for data sharing) - Chrome - Google toolbar

@Matt Cutts - I'd love it if you could confirm exactly which user data you DO and DO NOT use to influence rankings. Or, at the very least say on record that you don't do what Bing are doing and use data from bing.com

Overall, I'm not surprised that Bing are doing this for some keywords - all the major search engines use a massive number of different signals. I'll be more surprised if it turns out this is happening at a large scale or for competitive terms.

bambax · on Feb 1, 2011

Looks like Microsoft is "innovating" again...

shadowfox · on Feb 1, 2011

A great deal of value has been added to this conversation by this comment going by the upvotes.

bambax · on Feb 3, 2011

It's a little late for me to elaborate but here's what I meant: Microsoft has a long history of copying its competitors and calling it "innovation". Absent any other evidence, that tends to put the burden of proof on them.

gojomo · on Feb 1, 2011

It's instructive to think of the cases where Google can return a search result, even though the searched word doesn't appear on the page. Most often, this occurs because another site includes an outlink to the page, with the searched word. That is, they're 'copying' a publicly-available source that indicates that word is associated with that page.

I see this Microsoft tactic as similar. They're considering search terms that resulted in a visit to the page from other search engines as being important indicators of the page content. If they have that URL-to-URL-trail data legally, and the signal works well, and they are not singling out Google's URLs as the only source of such a signal, I'm not sure what the problem is.

Google didn't get where they are by throwing out legally-collected useful data, and Bing won't catch up to a leader who has clicktrail sensors everywhere, via analytics/toolbar/ads/mobile/etc., by throwing away legally-collected useful data.

DenisM · on Feb 1, 2011

TL;DR

1.Bing is inferring search results from user behavior, collected via Bing Toolbar

2. Google team makes an experiment: using Bing Toolbar to feed Bing particular behavior. Namely, they all go from a search result page on Google.com laden with a unique word to a particular target site.

3. Bing infers connection between the unique word and the target site.

4. Google cries cheating.

seasoup · on Feb 1, 2011

Wow, I'm surprised by all the developers on Microsofts side on this one. Google spends a lot of money developing proprietary algorithms for determining search results. Microsoft is then stepping in and taking advantage of the money Google spent by copying some of their results. It's rather like someone taking the results of a Consumer Reports list and publishing it themselves. It borders on illegal, and it's definitely shady.

But what I think is more important is all of the flak that Google has been catching for supposedly slipping in its quality of search results. If it's quality is so poor, then why is Bing stealing its results? It's a great method of striking back at the negative PR they've been receiving.

contextfree · on Feb 1, 2011

"If it's quality is so poor, then why is Bing stealing its results?"

Bing is apparently using toolbar click data (AFAICT it hasn't been shown that this is specifically targeted at Google or even at search engines in general) when it has no other information for the given search term. That has very little relation to the quality of Google's search in general.

Bitmobrich · on Feb 1, 2011

Google copies Bing's layout for images. Bing copies Google's search results.

btipling · on Feb 1, 2011

The funny thing is that people with the Bing Toolbar are still using Google to do web searches.

yoyar · on Feb 1, 2011

MS has a long history of stealing and sabotage. This doesn't surprise me one bit.

darrenkopp · on Feb 1, 2011

So what? Is it a scandal that Walmart and Target both send employees into each others stores and actively monitor prices on items? It's called being competitive, and to be competitive you have to at least match what your competitor is doing, then beat them.

random42 · on Feb 1, 2011

Bing responds.

http://searchengineland.com/bing-admits-using-customer-searc...

mkowalski · on Feb 1, 2011

It's interresting. A little bit like browser wars, isn't it? Browsers are really similar between themselves. If any new noteworthy feature appears in one, it is very likely to be copied to another, which is a very good thing for end users and is a reason for which competitiveness is good. At the end of the day, users want more-less the same functionality, no matter which browser they use. There are some differences in details and quality, but rather minor.

Both Bing and Google are targeted towards mass market and I think people expect the same from both. If Google does it right, there is nothing more to invent. And even if there is, it is probably pretty expensive. It is so much easier to copy than to invent from scratch, just to get something almost exactly the same as Google :)

I am really interrested in what could Bing do to be REALLY different or better than Google. And if they did, Google would most likely do something very similar :)