Hacker News new | past | comments | ask | show | jobs | submit login
Google: Bing Is Cheating, Copying Our Search Results (searchengineland.com)
684 points by illdave on Feb 1, 2011 | hide | past | favorite | 272 comments



Hmmm. Let's say that Bing sets up a script that sends queries to Google and then records the results. That's clearly copying. But what Bing does is when you use its toolbar, it watches what you do and uses that information to rank results. Is that really copying? It showed Google's Honeypot page because Google's engineers were clicking on the Honeypot page with the toolbar installed. That isn't copying Google's results, that's copying the actions of Bing toolbar users.

This can easily be demonstrated. Google can set up a second honeypot but instruct its engineers not to click on the link, ever. If it shows up in Bing's results, then Bing is watching what Google returns and scraping its results.

But if the second Honeypot doesn't show up in Bing's results, then clearly Bing isn't copying Google's results, it's copying its toolbar's preference for links.

The entire thing is moot to me. The takeaway in't whether Bing copies Google. The takeaway is that Bing's toolbar is spyware :-)


Let's say that Bing sets up a script that sends queries to Google and then records the results. That's clearly copying.

I'd question even that hyperbolic interpretation. Let's say that Google sets up a script that sends queries to websites and then record the results and incorporates what links are shown on that site into their search rankings. Is that clearly copying? No, that's just pagerank.

If you have a web directory, a link page, a blogroll--isn't Google "copying" your work by using it to improve its search results? How is that any different from what Bing's doing?


> No, that's just pagerank.

This is my first thought as well. Google's pagerank analyzes the link structure of the web as one of the inputs to its search ranking. Apparently, Bing's toolbar analyzes page content coupled with user click behavior as one of the inputs to its search ranking.

These two things don't seem very different to me. Both of them are relying heavily on the value provided to them by tracking and analyzing the behavior of users on the web to drive search results.


I have the same thought. It is more a matter of framing. A while ago some people accuse Google of unethically profiting because they are farming the link structure of the Internet, which is the labor of many people (is it Nicholas Carr?) I don't really buy this framing. But Google's accusation seems fall on a similar line of argument. You can also setup a "Google sting" to prove they are copying from the Internet. It is called "Google Bombing".

Bing will be at fault if they specifically target Google. But if you consider entering a keyword and then click a link is essentially targeting Google search, then it only expose another problem, that is Google's monopoly on the search market.


We agree that it is copying, the subject open for interpretation is whether it's "wrong" in some way. Information is all about copying, that's the whole point :-)


but the link between the obscure query and the click on the page wouldn't have been made without bing knowing the user first searched for the query on google, no? if it were simply boosting page clicks that would be one thing, but how else could rim.com rank 1st for "mbzrxpgjys"?


What the experiment shows is that where no other data is available Bing will use what it has and that Google can successfully seed Bing on the long tail. What it doesn't show is that in typical circumstances Bing is relying on data gathered from Google searches.

Microsoft is collecting the same sort of information on Google queries that it collects on Bing queries and that Google collects on Google queries. All this is happening at the long tail where both companies are most likely using something other than webcrawling to tailor search results - afterall the whole experiment is only possible because Google can seed page rankings at will to link arbitrary terms to specific search results.


Well yes there is a hard link happening between a google search result and a link being clicked, however googles argument isn't as strong if it turns out bing is doing this for all search engines. It might be that they aren't targeting google specifically, but instead they're targeting all search sites generically.


But then still, the Bing toolbar is watching what you're searching for and recording that information. That's a pretty big privacy issue


Google doesn't really want to get into a heated discussion about the evils of a search engine knowing everything you've ever searched for. Stones, glass houses, etc.

(Given Google's near-monopoly of the market, Microsoft and DDG have some amusing competitive synergy going on, don't they. DDG can criticize Google all they please for retaining user data because DDG doesn't and isn't in a position to benefit from it. Microsoft, which certainly is in a position to benefit from it, doesn't need to worry about Google calling them on it because Google is the only search engine that can actually lose market share over the issue.)


For all Google’s sins, there is a Dashboard that let you erase all you care them not to know: Google could promote that heavily in case of stone, glass house. I always assumed this would be a great way to learn more about queries: spotting what words people are ashamed to have searched for.

It could be something a registered user could set from a browser toggle, and DuckDuckGo is a very good project, or course. My point was: data portability and user control are within Google's long term interest, not being evasive about their data cache.


While it's great that Google lets you delete information, harvest-your-data-by-default is not a choice made with your best interests at heart.


This is what all toolbars do, and is largely the point of why big companies offer them and pay little software companies to make them optional installs (see Corel's WinZip, which installs the Google toolbar)


And that is explicitly stated in their tos. There's no hiding here.


Let’s agree “not hiding” is what is in the demo video. TOS… we all know they could add that you sell the soul and the virginity of your mother in there, no one would read it.


It only records that when the user has explicitly agreed to send anonymous data. chrome and Google toolbar do that too and so do most of the toolbars out there.


I think this would still leave Google with a fairly strong argument - if Bing does it for all search engines, then they're effectively copying whoever is most popular. Since it's done through Internet Explorer, which is still bundled with Windows in most places, they could try to make the argument that Microsoft is using their position in the OS market to crush competition in other markets.


Interesting angle to go through the tied market and competition policy: that's a type of authority that is far more intelligent, and precisely just prosecuted IE in Windows. However, you'd have to either have a US court acknowledge that a European was right to disagree with them in the first place, or have a European court admit that their previous decision wasn’t enough. It’s feasible, but hard.

Where you’ll be more limited with it, is that it’s apparently not IE, but the Bing Bar that is at stake—the connection is getting thinner.


If a page contains a unique word, and people who were on that page universally go to a different page, that could be enough evidence for bing to assume there's a link between the unique word and the target page.


There are browser toolbars that aren't spyware? I'd think anyone likely to actually hear about this would have already been assuming that all browser toolbars are spyware.


It might be a good exercise to chase down old posts of people who wrote the first browser toolbars, as well as the browser infrastructure that made them possible. We can contrast the speculation on why they might have been a good idea with the actual result. Not as a way to trash them, but as an exercise in how smart people can miss the mark.


More popular if done as a way to trash them, I expect. You could make "trash old usenet posts".com and have a ranking system for "worst decisions ever".

(C.A.R Hoare's billion dollar mistake, for example).


As it happens there's an exact historical precedent for this. Post code <-> long/lat data is copyrighted in the UK, but users were using Google Maps (and others) to do conversion and supply them to open source databases, the end result is that Google had to change their licensing/API to restrict this sort of behaviour.

Just because you're copying the data indirectly through a third party doesn't mean you're not breaching the copyright.


Interesting case to raise. Reminds me of the NFL terms voiced after every game about the broadcast being for private use only. I imagine that if their terms don't already include a clause like this, they can try to suggest that a toolbar tracking user clicks is violating the terms.

Very murky waters. If Google starts complaining that other people are tracking their users, they might end up educating users about how much they and their advertisers track.

...I hope DuckDuckGo figures out a way to capitalize on this brouhaha...


UK law seems to be unusually stringent on this though (e.g. FootballDataCo claims copyright on facts such as dates of football fixtures)


"The takeaway in't whether Bing copies Google. The takeaway is that Bing's toolbar is spyware" And also that you can control search results in Bing. Nice feature for advertisers.


You might need to hold in your hands the ethics of the webspam team at a major search engine, though.


Only for terms that are worthless.


For all the money, resources and engineering talent that Microsoft has, you'd think they wouldn't need to do this, though. That's what's baffling to me.

Call Google the market leader all you want, but let's not forget that Microsoft's market cap is around 40 billion dollars greater than Google's.

That's more than Research in Motion's total value!


how would you expect a search engine to be able to surface a web-page given a specific query in which there is no data to create the relationship except the signal from google engineers spamming bing with click data.

these are cases of outliers. they don't exist on the real internet, or at least where pages exist without any other data (anchor text, inlinks, outlinks, words from the query in the document) they never get surfaced from a search engine.

abscent fake click-data there is no way google could surface these documents for the specific queries. in fact google states this openly in their "attack piece". before they manually changed the rank of these document they didn't surface these either.

the only evidence of "cheating" is that bing surfaces document for which there is no known relationship between the query and the document, except for spam created by google engineers. this is evidence only of a bug in bings ranking algorithm. clearly it is using signals from google. just like google uses signal from CNN (keywords, inlinks, outlinks, anchor text, etc).

i'm sure bing is thankful to google for helping find this defect in their system and are hard at work to fix it.

people talk about bing copying search results like google invented search results and put a lot of hard work into them. in this case the only hard work they put in was designed to spam bing.

i can only conclude that google is getting worried about bing quality and has run out of ideas on how to fix their own problems.


all search engines make use of a variety of signals. Bing decided to use what users click on as a signal. Google spotted it and thought it was 'zomg bing are stealing our results'. I don't understand why you think taking advantage of a new signal to improve search is not a smart move by Bing?


The googlers are just angry that they couldn't come up with that genious idea themselves.


I don't see this as being any different from what Microsoft has been doing for 20+ years. They let a competitor put the work into figuring something out, then make a a reasonably accurate facsimile thereof. I think it's lazy, but not particularly unethical. If Google were Benz, would they be complaining that Ford was making 4-wheeled vehicles with an engine? More appropriately, and given that I've been on a Top Gear bender lately, if Google were Cadillac would they be complaining that everyone else was copying their method for operating vehicles, with three pedals, a gear shift, a steering wheel and a handbrake?

I get why Google is upset, but this doesn't strike me as unethical behaviour in a free market.


Does anyone think that the Google toolbar doesn't do the exact same thing? Just sayin


Here's an easy test. Test with the Bing toolbar installed but with Bing and some other search engine (blekko, whatever).

This should help establish if it's the toolbar that is sniffing.

If so, while it may be questionable behavior, Bing would not be copying Google's results.


I had a front row seat for this test. I believe the experiment we ran provides conclusive proof. I'm on a panel with a representative from Bing later today and I'll ask Bing about this directly.


It's embarrassing for Google to complain about this. You FINALLY get a little competition on your turf and you try to make some big issue that, as a market leader, the product you produce is being watched, analyzed and in some ways incorporated by your competitor.

There is no victim here. They are not taking your 1st result and copying it. They are taking the result the user clicked. Obviously you didn't predict that with your algorithm or you'd have always made that the 1st result. Instead, what they're tracking is user behavior, not your raw ranking.

Obviously users give Google implicit permission to track their behavior by using your product. And similarly, by installing the Bing toolbar, they're giving Bing that permission.

This is beneath you Matt and it's beneath Google.


In short, Bing Toolbar infers relationship between words on the page and the next page the user clicks on. Google's team purposefully confined Bing Toolbar behavior-tracking algorithm to their use of google.com search result page, and then cried fool about "bing stealing Google's search results".

This is disgraceful attention-whoring on Google's part. Quite surprising, too, as I don't remember them ever stooping that low.


They may just be using terms in the referring URL rather than the referring page. It'd probably yield better data. Most site-specific search pages would have the search term in the URL.

I wouldn't be surprised if they're more interested in data from domain-specific sites like epicurious than generic search sites like Google.

I'd also guess that this won't work once the SEO guys figure out they can feed fake clickstreams to MS.


Referring URL? Oh right, Google intercepts all search result clicks through a redirector, and includes the search term in the intercepting URL. Yeah, the unique words in that URL could figure into describing the content of the final destination page (after redirect), the same way anchor text figures as well.


Just because you are big, doesn't prohibit you from exposing issues.

We really don't know how much value Bing puts on clicks made on Google. Perhaps a lot?

What's the Google's official stand on this?


Even Matt's allegation is softened by "I believe" here: there appears to be nothing that conclusively indicates Bing is solely targeting Google. For example, the observed behaviour could be a side-effect of a generic algorithm to extract and associate search queries with a user's click stream, which is only a minor variant of what Google itself does with its own toolbar.

If the case described above were true, then all Google has done here is to make inconclusive accusations and use the occasion to highlight its own dominance over search.

It seems to me this is just a cheap and slightly seedy PR stunt.


Associating search queries and click stream behaviour is fine if it's your own search engine. Doing it for someone else's search engine isn't (regardless of who's doing it).


In the hypothetical situation above, it has almost nothing to do with the search engine - it is both the user providing the query and selecting the result: this is the data of value, not which intermediary provided the list of results to select from.


Really? Doesn't the intermediary that narrowed the list from several billion possible matches to the best 10, including correcting the inherent spelling errors in the query carry some value?


> Associating search queries and click stream behaviour is fine if it's your own search engine. Doing it for someone else's search engine isn't (regardless of who's doing it).

Why? This seems like a great idea.


I don't think it's embarrassing to point out that Microsoft is playing dirty.

They played dirty with Netscape/IE in the 90s and look what happened.


They have a very long history of playing dirty. Lotus learned it, MS took information Lotus shared with them and then shared it with Excel and Office and they supposedly kept Lotus on an API changing treadmill. Digital Research learned it, MS wrote code that made Windows 3.x crash if it detected DR-DOS. Netscape learned it. Arguably, IBM(OS/2) and any other operating system vendor learned it in the 1990s as well, MS charged premiums if hardware vendors wanted to install non-Windows operating systems. They sort of tried to do it to Intuit, they made a competitor and then effectively gave it away for free. Enough so that a lot of folks avoid Mono like it's, well, actually mono. They've established that reputation, and most of the time, by the time it became clear what was going on, MS had already done irreparable damage.

It is kind of embarrassing for Google, but if it is real and it continues, it's better to address it now rather than after MS becomes a titan of search and Google's market has eroded. At times, it seems like MS has changed in ways, but fundamentally they're still run by the same guys. Remember that when you play your Xbox or use Bing or any MS products, they don't like to see other successful software companies.


Also there's the strange fact that Bing and their whole online division makes gigantic losses. They're not in it for the money, they're in it to stifle competition and hold back progress so they can milk their cashcow some more.


I think they are complaining because it could be far more widespread: it would actually be easier for the head than the long tail. Where it‘s harder is for News, and Bing appears to lag for recent results.

I remember that when Bing went out, everyone was wondering how close to Google the results were (and talked about it as a good thing).


If it's beneath Google to complain about Microsoft riding on its coattails for the highly valuable "long tail" of queries, surely it's beneath Microsoft to sue Android manufacturers for competing in the smartphone space?

http://www.ft.com/cms/s/2/8b1ecaa2-cdb2-11df-9c82-00144feab4...

Would it have been better if Google had jumped straight to the questionable lawsuit part, like every other company seems to do when threatened on its own turf?


Let me try to understand what happened. For some obscure searches that return no results normally, a handful of users searched on Google using IE's toolbar and then clicked on bogus results.

Those bogus results made it to Bing's results eventually.

Ok... So it proves Microsoft analyzes the toolbar behavior and when it has no other data, it will therefore look like a copy of Google search.

Sounds fair to me. Do you want to get into a discussion on how exactly Google tracks you online?


"Those bogus results made it to Bing's results eventually."

... in 7-9% of cases.


in 7-9% of the times google tried to spam bing. not in 7-9% of search results. this indicates that maybe google isn't that great at figuring out how to spam bing or that bing is pretty good at defending agains spam. maybe google could take some lessons from bing on cleaning up spam and problem that seems all to prevelant on google these days.


There is, however, the legitimate complaint that they apparently do not finally have a little competition on their turf. Their competition is cheating, not innovating. That helps who, how? At best for "competition" sakes, Bing nabs a big share of the market; now there are two big dogs who make it hard to enter into the search realm with new ideas.


As we all know, this isn't the first time Microsoft has copied someone else. And I'm sure it won't be the last.

I think Google has a right to complain. Microsoft has resorted to these less than innovative tactics to monopolize themselves for a long time now, and it isn't fair to companies like Google who have worked their butts off (and gave 1.8 million shares - $336M in 2005 - to Stanford for the PageRank algorithm) to develop their superior product.


You clearly didn't read the whole article. If you did, you clearly didn't understand the article. Step back a little bit, fanboy. Microsoft wasn't copying anyone here.


Oh I definitely did read it in its entirety and understood it perfectly. What you're failing to do is see the whole picture.

Let's put it this way... if Google hadn't bought the PageRank algorithm from Stanford and put years of work into perfecting their search results, Microsoft wouldn't have any way to track which Google search results users click. It's an unfair tactic that clearly demonstrates Microsoft's sketchiness and desire to monopolize themselves (by any means necessary, "evil" or not) wherever there's a computer.

As for the fanboy comment... I'm certainly not a fanboy but I'll let the following speak for itself: Microsoft Internet Explorer vs Google Chrome


All your comments are coming from your assumption that Microsoft is trying to monopolize in something - in this case, search. Hence, your comments (although you will disagree) are biased and irrational. Microsoft isn't trying to monopolize in anything nowadays. In fact, they can't, so they aren't even trying.

From a search engine user's point of view, I believe this whole fiasco is ridiculous. First, it's ridiculous because Google is handling this situation very immaturely. Matt Cutts should not have confronted the VP of Bing in a way he did. Second, if I were the user of the Bing Toolbar, I gave permission to the Bing Toolbar to use my behaviors to polish my search results. I have no problem with that. Lastly, the experiments they did has more to do with "guessing what user wanted" than "what PageRank does".

I've used Bing fairly often past 6 months because of too many spams Google search results were giving back. Now that Google has fixed (or still working on) the spam problem, I'm starting to use Google again. However, what I noticed from the past 6 months is that Google search isn't so much better than Bing. This Bing Toolbar fiasco only applies to synthetic queries that I would never make.

Is Bing cheating? I don't think so. To me, they are just using another signal from user's permission. However, the definition of cheating will be different for everyone else.


I disagree. From some point of view somewhere this is "standing on the shoulders of giants."

Perspective changes things here, which means no one is "right" or "wrong".


Err, by that line of thinking, Google leveraged Linux (the hard work of volunteers) to earn tens of billions and does not release the modified code for use of the volunteers. Of course they are not required to, but it isn't fair to Linux developers who have worked their butts off to develop Linux.


I strongly disagree, it's not the same analogy. Linux developers explicit say you can use the code for free.

It's more like Linus say you can't use the code, but Google use them anyway.

Btw, Google contributed a lot to open source projects.


>It's more like Linus say you can't use the code, but Google use them anyway.

By installing the Bing Toolbar, users are giving permission to track their clicks. If Bing's server farm is searching Google and parsing the results then it is more like your example.


They employ Andrew Morton and Ted T'so, and have been working pretty hard to get the delta between their custom Linux and upstream down- I'd say they've been pretty fair to Linux developers.


I have a suspicion what you'll find is that Bing use the toolbar to match $current_page_content with $clicked_page_content. When $current_page_content contains obscure words, that becomes the only signal, and so bing's engine will naturally associate it with $clicked_page.

In other words, there's a relationship between Page A and Page B if there exists a link beween them (==PageRank). But the strength of the relationship is increased based on how many users click on that link. I think that's the information Bing were trying to capture (or if they weren't, they should have been).

What I'm saying is that it's probably an unintentional side-effect. At scale though, the effect is that Bing gradually uses Google as a signal, simply because Google is a popular site.

edit: Yet another way of saying it: I think it's not just clicks on Google searches that are captured by Bing, but clicks anywhere. Google is a large site, so its influence on Bing can be measured. This is what we're seeing. My theory. I don't work in search.


Exactly. If they are just matching (even more simply) $search_term_entered to $clicked_link then you would expect that they are "copying" from any search engine configured in the toolbar.

Now the interesting thing to reverse engineer is what other information might be passed along to give relevance to the search term/click pair. If Google could establish that there was a third piece of info in the tuple, such as "originating search domain" and that Bing used this to weight term/click pairs based on the authority of the source, Google's claims would hold more water. I suspect that Bing has to apply some kind of validation of the term/click pairs (for instance, only sending pairs that appear on the same results page from accredited engines), otherwise they would be subject to "Bing bomb" attacks where users or botnets vote up lower ranked (or even unranked) clicks for a given term. (And if they don't validate or detect gaming, then there would be ample opportunity to inject all kinds of synthetic behavior into Bing's search results. Based on the relatively few number of users and clicks it took to own a long tail term, it seems like the protection they have is very weak or simple.)


This makes a lot of sense, and would have be easy enough for Google to test as well, creating some tiny, brand new, never before heard of test search engine that Bing would have no reason to copy, see if the same thing happened.


That would be a nice way of testing it.

edit: I'm not even sure if it's only search engines that are being analysed by Bing or all pages, but it's possible that it is just SEs - they could be capturing query terms distinctly.


They also probably should have tested to see if it happened with results other than the ones in the #1 spot.


Also, why did the experiment succeed for only 6 or 7 of the 100 terms that they tried? There's more than what meets the eye here, regardless of the hype and everyone jumping on the bandwagon.


The article suggests that the Bing toolbar monitors what its user click and uses that information to improve Bing search results. Is that what you have conclusively proved?

I'm interested in another experiment. If you set up a honeypot, search for the term, but never click on the link, does the honeypot start showing up in Bing? The article doesn't say whether you tried this. Did you try it? Are Bing scraping your results from the page or only tracking their users clicks?


Anyone can test that Microsoft's software sends the clicks back to Microsoft, although I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.

Google's search results are blocked in robots.txt, so I don't believe Bing has been able to crawl our search results directly. All the evidence points to users' clicks on Google, which are then sent to Microsoft.

Microsoft has (so far) declined to admit whether our allegation is true. Getting them to talk about exactly what they do and what software they use or don't use would be the easiest way. I'd like them to confirm or deny, which is why I wanted to go to this search panel later today and ask them.


> so I don't believe Bing has been able to crawl our search results directly

Isn't compliance with robots.txt more of a voluntary thing?

I'm not accusing MS of ignoring it when convenient, but if you/we/someone is accusing them of acting unethically wrt search results in the first place, telling the crawler to ignore robots.txt wouldn't be that far away, would it? (And likewise faking the user-agent, etc.)

For better or for worse, UA identification, robots.txt compliance - all those things are voluntary. I'm not suggesting they shouldn't be, but it certainly makes a difference in terms of whether something's possible or not. (And, if you ask me, places an even higher obligation on the actors to behave ethically, lest trust completely evaporates and the whole thing goes to hell in a handbasket).


I am not a lawyer, but as I understand it there is some precedent in the US of intentionally ignoring robots.txt being unauthorized computer access, exposing you to all the liability that entails (possibly criminal).


I'd like to see an actual case reference for this. I've never heard of ignoring robots.txt resulting in any kind of legal action.

It would take a pretty big leap to go from robots.txt is advisory to ignoring it constitutes a criminal action.


Internet Archive was sued unsuccessfully. As I understand it a lawsuit is still in process against Google on the topic. So I guess the precedent is weaker than I thought, but still: tread carefully.


Matt, don't Google Toolbar and the Chrome Browser similarly send information to Google for use in improving their services?


If you read the article and other comments here it's been made perfectly clear that the Google toolbar and Chrome browser are not sending similar data back to Google.


Ah, at least the google toolbar does. If you enable PageRank on the Google Toolbar it sends back all the urls you visit just like the bing toolbar.

From the toolbar privacy policy: "Toolbar's enhanced features, such as PageRank and Sidewiki, operate by sending Google the addresses and other information about sites at the time you visit them."

Google has managed to demonstrate one way MS appears to be using the data. What does google do with their trove of data? That's a lot of data to collect and not do anything with.

If they want to make it perfectly clear they should add into their privacy policies and EULAs.


Yes absolutely. I don't think anyone in this thread or in the article denied that the Google Toolbar sends data to Google. And you are absolutely right that Google's use of the data collected should be clearly stated in a privacy policy and EULA. It might be, I haven't read them.

But the article clearly covers the available public statements on this issue and patio11 dug up a post from Matt Cutts in his comment below that directly addresses this: http://www.mattcutts.com/blog/toolbar-indexing-debunk-post/.


I did not say "similar data" because "similar" is a bit too slippery a word in a technical context. There's too much plausible deniablity. What I am asking is if Google's tools send data back to Googleplex to be mined for the sake of search engine improvements.


Then what use is the word "similarly" in your comment? Similarly send? As in via HTTP requests? I think that's either obvious or irrelevant or both.

Again, if you actually read the article, you will come across the section titled "What About The Google Toolbar & Chrome?" I encourage you to read it.

[edit] Also, see this comment and patio11's subcomment further down the page, both of which were written an hour before yours: http://news.ycombinator.com/item?id=2165469#score_2165578.


Quote from the article: "In fact, Google stressed that the only information that flows back at all from Chrome is what people are searching for from within the browser, if they are using Google as their search engine."

I'm pretty positive that's not true. If you run Fiddler when browsing with Chrome you will see constant hits to toolbarqueries.clients.google.com whether you're using Google or not. I could be browsing some MS site and toolbarqueries.clients.google.com gets hit. Chromium doesn't do this.

Edit: You can uncheck everything under privacy and it will still send those requests.

Edit2: What it sends back looks something like this:

<?xml version="1.0" encoding="UTF-8"?><autofillquery clientversion="6.1.1715.1442/en (GGLL)"><form signature="8551191143090325242"><field signature="620769395"/><field signature="2995202485"/><field signature="2175865763"/><field signature="904516291"/><field signature="2953051246"/><field signature="2649047790"/><field signature="2308153337"/><field signature="1003471793"/><field signature="3255484099"/><field signature="1305698505"/><field signature="3676143819"/><field signature="1275502930"/></form></autofillquery>

Looks like auto-fill data, but this happens when I click around a site, NOT when searching Google or typing something in the address bar. For some sites (interestingly, not all) it sends 3 requests for each page load.


That's troubling. I'd be very interested in seeing a response from Google about this. Are you aware of any? Also, can you use Fiddler to inspect the content of the requests? I'm not familiar with the tool.


I see this too, if I have autofill enabled, and at least one autofill address entry.

I would guess that Chrome is sending a hash of the <form> (perhaps URL + method?), plus a hash of each of the <input> tags, and Google returns some sort of information about what kind of form it is?

If so, it would mean it's pretty easy for Google to determine which sites you're on from the pattern of hashes sent for each site. e.g. I see this data sent in the clear for pretty much every page on https://www.facebook.com/


Is this malicious site detection by any chance, or does that use a different mechanism?



>I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.

Please. Adding my own SSL cert to my own laptop is not harder than I'd expect. Certainly not harder than many other things you did in setting up this experiment.


are you claiming that google never scrapes bing search results pages? or any other search result pages?


poacher69, we crawl the public web. Anyone that blocks us out with robots.txt, we won't crawl. If you check bing.com/robots.txt, it has "Disallow: /search" . So no, we won't crawl Bing's search results pages. If anything, users tend to complain when search results from Lycos or wherever show up in Google.


http://www.bing.com/robots.txt User-agent: * Disallow: /search

Funny thing: http://www.google.com/search?q=site%3Abing.com%2Fsearch%2F

I was gonna call out Matt for crawling bing's search results but I'm guessing Microsoft hasn't realized they return results from the /Search/ folder. ;)


Once again Microsoft is bitten by expecting case insensitivity.


matt, how does google do competitive relevance evaluations without scraping Bing?


From my experience, Googlebot doesn't crawl pages that are blocked in robots.txt files. Check out Bing's robots.txt: http://bing.com/robots.txt - notice how /search is disallowed. That typically means that Googlebot isn't able to access that page. The same for the other search engines, it's more down to if they specify (through robots.txt) that Googlebot isn't allowed to crawl those results.


Given that Google appears to have an active program to monitor the results of search queries on Bing and to track Bing's page rankings and the ways in which they change over time in (How else is this more likely to have come to their attention?), they should hardly be shocked, shocked to find Microsoft doing something similar.

[edit] I have always suspected that the real value of Bing for Microsoft is to prevent Google's data mining of queries originating in Redmond.


... and feeding that data back into their own search results, rather than just using it for analysis to see how your competitors are performing?

If I'm in the business of giving horse racing tips and I read your tips to see what your strike rate is compared to mine, that's one thing. If I start tipping the same horses as you, purely because you tipped them, that's quite another thing.


Microsoft isn't using strike rate data in the important sense which you imply - the strike rate for Google is advertising revenue. It's not as if Microsoft is collecting info on the advertisements displayed and then soliciting those advertisers to spend their dollars on Bing (at least that's not part of the allegations). I am pretty confident that Google feeds every bit of legally collected relevant data back into their search algorithms.


no, the "strike rate" is precision/recall. Good advertising CTRs is a side-effect of relevancy. Relevancy is measured by precision and recall.

edit to expand: If the measure by which a search engine evaluated itself was advertising revenues, they'd all have massive intrusive adverts, and no users. The only viable measure can be the quality of the search results themselves. As a happy coincidence, if you build something capable of delivering high quality results, you can very easily use that to produce highly relevant adverts. Imagine that each advert is like a little webpage, and rank them just the same as you do for normal webpages. (caveat: there's no link graph for adverts, so we're reduced to using a simpler text mining approach, eg bag of words vector space la-di-da).


I believe that Google and Microsoft evaluate their search engines by entirely different measures. Google primarily by advertising revenue, Microsoft primarily measures by preventing searches using Google. Sure the ad money is nice for Microsoft, but they don't need it for their business to be profitable and they would be doing major research into search anyway because of its importance to businesses - they sell databases after all.

This whole episode points to the sort of counter-espionage operations the two companies are engaged in. Look how important a propaganda victory is for Google? It strains credulity to believe that the release of this information on the day of the panel discussion is pure coincidence.


Microsoft, historically, have been kings of the desktop. With the rise of the web (which post-dated MS's rise), the desktop has become less and less relevant. Google is fast replacing them - my email, documents, search, advertising, analytics is all handled by Google. I don't use Microsoft for anything in my day-to-day life. Even on my main windows machine, my files are in my dropbox, outside of MS's control.

MS are desperate to regain control. Google will soon launch their own web-centric OS properly, and bam, MS will have no business apart from selling to an ever-dwindling number of companies who can't believe MS don't rule the roost any more. In 20 years they will simply cease to exist if they can't come up with a world-beating online product and win back control of people's computing lives.

Notice how they're diversifying into games and search in order to prepare for the worst case; that their core OS and 'boxed software' business fails.


Which may explain why Google is very successful and Microsoft is forced to buy its customers (and have huge losses in that division).


I think you may misunderstand my position. Bing is basically a research project for Microsoft. The ad revenues don't really matter, the data they collect does - it's learning v. earning. It makes sense for Microsoft to spend a billion dollars because better search algorithms have application for their B2B products and services. If they recover some of their R&D costs directly through advertising revenue, that's a windfall to the overall bottom line.


Bing copying Google's search results is just evil. Not like copying Apple's iPhone design and user interface and giving it away to Apple's competitors, which is good. Right?

It's ugly and immoral and probably legal. Good job catching Microsoft at it (and I think the really really unethical and scary bit is that Microsoft is cheerfully stealing info from users via their browser). I also realize that Google got where it is in Search by innovation and iteration, and that Google's search team has nothing to do with Android per se, but you might see how Apple people feel about the business empires built on stealing their ideas.


I don't think Android devices being similar in some ways to the iPhone is even remotely analogous to what Bing is doing here. One is called healthy competition (and I don't see how Android is copying iOS). The other is literally just copying data.


The original Android phone design looked like this: http://tctechcrunch.files.wordpress.com/androidphone.jpg

Then iPhone came out and it looked like this: http://km.support.apple.com/library/APPLE/APPLECARE_ALLGEOS/...

Now Android phones look like this: http://farm3.static.flickr.com/2795/4208849005_dd4b608729.jp...

You don't see any signs of copying here?


They are similar in that they both are big touchscreens with no physical keyboard. Apple did not invent that by any means. That is also a shot of the Android App Drawer and not an actual homescreen. Most Android homescreens I've seen have a few widgets on them and do not look anything like a big grid of app icons like the iPhone. Also, that "original" Android phone design you point out looks a hell of a lot like a Blackberry. Android must have blatantly copied RIM by your logic.



Yeah, point to Palm -- a company built by ex-Apple people that produced a lower-cost Newton clone.


FYI Android started before iPhone was released. It's based on the Sidekick.


I usually criticize Microsoft, but I am on their side in this case.

Why shouldn't Microsoft be using this kind of data? Google search result pages are part of the internet just like any other publicly available web site. Microsoft monitors what the users are clicking on Google and probably on Bing and other sites. So what? Monitoring users is not a new thing. It may be unethical and I may personally hate it, but almost everyone is doing it.

Google should stop whining about this and make their search result the best they can. If they had the best search engine, Bing could come close, but never overcome Google by just copying part of it.


Great panel session. One of the best I've seen. I think you made your point, although I think MS did a good job neutralizing it too.

My question for you Matt... is there any way for Google to build a toolbar that effectively does what the Bing toolbar does (or even a joint one?). I jump to use the various search engines because no single search engine is sufficient. But clearly when Google isn't sufficient, you don't get the value of when I go to Bing. And vice-versa (as I don't use the Bing toolbar currently, but that may change now). Or do you feel that with 65% of the market, you don't need this info?


I can't argue about the legal vs illegal. I sure would like to tell the bing team that they should better google in their attempts. This strategy is mere copying into- 'look, I am as good as google'. Microsoft never learns from past do they??


Please keep us posted as best you can.


I believe the search panel with Google and Bing will be live-streaming, so anyone can tune in. It's by a group called Big Think.



I couldn't find the streaming video from there. Did anyone find it?


According to the description it'll be live at 10AM PST (a little before this comment is 2 hours old).


I guess they'll put the streaming video up when it starts (10am PST, so in just under 2 hours). That seemed like the most appropriate page to link to.


Looks like the site can't handle the traffic as of now (1.20pm ET).


You are just jealous that your engineers haven't thought of this yourself.

I bet is just recording general searches (input query) + clicked links. A pretty good idea.

And don't tell me you are not using the results from the google toolbar to rank the sites in google search.


Is there a chance this panel is being recorded? I'd love to be a fly on that wall :)


Oops, sorry. Found the link elsewhere in this thread!

http://news.ycombinator.com/item?id=2165861


I can't help but feel a little surprised that Google found it ethical to lie on their search results page for any reason; and clearly this trojan horse page was a lie. Certainly at every search engine I've worked at we always said, "We can put up adds and help, but we can never outright lie." I suppose that's ameliorated by the fact that this was an internal experiment, but still..

The temptation to abuse that power is pretty big.


I thought this was the most interesting part:

> The day after that, Bing contacted me. They were hosting an event on February 1 to talk about the state of search and wanted to make sure I had the date saved, in case I wanted to come up for it. I said I’d make it. I later learned that the event was being organized by Wadhwa, author of that TechCrunch article. [emphasis mine]

So the supposedly independent author of an article on TechCrunch that kicked off a massive wave of Google criticism is, less than a month later, organizing events specifically for a Google competitor? Boy, that sure seems above-board.


Matt Cutts from Google and Rich Skrenta from Blekko are also speaking at that event (not to mention Peter Thiel, Esther Dyson, and Malcolm Gladwell). It's an industry event that happens to be sponsored by Bing, it's not really an MS event though.


Still, a TechCrunch & BusinessWeek contributor that gets paid by Bing is interesting.


and techcrunch has long been hosting google ads. so.... what?


Uhh... Yeah? Everyone in search does this. I've worked at and with 3 major search engine initiatives, and we all tested heavily against Google in a variety of ways.

But the article definitely gets a few things wrong. For example, having worked at Bing I can tell you this: in general "obvious" misspellings are autocorrected without comment. It's not some sort of magical copying procedure, it's actually a policy. Want proof? Here's an example query you can repeat: http://fayr.am/4KdG (direct query link: http://fayr.am/4JZD)

But otherwise, shit yes everyone is scrutinizing google trying to figure out what they're doing. That doesn't mean other players aren't doing their own optimizations, or even running relevancy metrics against other search engines. Relevancy is not a concept with fixed metrics, and every player in the search market does everything they can to figure out what their competitor is doing.

And even the raw results leakage is fairly par for the course. It's not like Bing searches are a crawl of google searches; Microsoft gets this data from browsers running this toolbar and uses it to help shore up queries where they don't return good results.


No surprise, as both DDG and Blekko disclose that they use Bing for long-tail queries, but it works at both of those engines, too:

  http://duckduckgo.com/?q=hiybbprqag
  http://duckduckgo.com/?q=mbzrxpgjys
  http://duckduckgo.com/?q=indoswiftjobinproduction

  http://blekko.com/ws/hiybbprqag
  http://blekko.com/ws/mbzrxpgjys
  http://blekko.com/ws/indoswiftjobinproduction


Ouch, you're totally right. So you're saying that because Blekko and DDG are powered partly by Bing, the rankings on all three engines are showing the results that Google inserted?


They shouldn't be in DDG. That's one of the pieces of intelligence we try to add on top. We try to not "correct" your query. Here's another example of a real search my wife did the other day that illustrates this point on both Google & Bing: http://duckduckgo.com/?q=suttle+renal vs. http://www.bing.com/search?q=suttle+renal vs. http://www.google.com/#&q=suttle+renal


I love that you don't try to correct queries. I hate when Google auto-corrects my query and I need to click another link to get to my actual query. Thanks for that!


But in this case DDG does give Google's auto-corrected (without even telling you!) and I bet it happens to many terms as it is mostly a proxy/filter of Bing.


No, actually they shouldn't be in there. Are you seeing them? If so it is a bug. It is not showing up on my computers.


I'm still seeing it on tmbg, but thor and www seem fine:

http://tmbg.duckduckgo.com/?q=hiybbprqag

http://thor.duckduckgo.com/?q=hiybbprqag

http://duckduckgo.com/?q=hiybbprqag

Edit: The same appears to be true for mbzrxpgjys and indoswiftjobinproduction

Edit 2: Hey, that's weird. Adding a comma, semicolon, period, or other symbol to the beginning or end of the query makes the gamed results show up on top at thor and www as well. Seems to work for all the terms at issue:

http://duckduckgo.com/?q=hiybbprqag%2C

http://duckduckgo.com/?q=----hiybbprqag----


You're finding the edges of my intelligence layer :). Should be fixed now.

Note that some of those servers don't get updated often.


Ah. They were loading the same results as Google and Bing when I posted that. They aren't now.


There were a number of bugs around the first result that this whole thing uncovered, so it could have been intermittent. Our backfill in the case of no results can vary. It shouldn't have been showing anything ever beyond the first result though.


Thanks for the clarification. I use DDG as my primary search, and love reading about how it works.


It was there using those links when I posted, but not anymore.


Or you just "quote" your original misspelled query and Google will search for the misspelled version without any additional click.


It wasn't a misspelling though! That's the point. She is a cancer researcher and she meant that. Renal is for kidneys and Suttle is an author on various papers in the field.


Quotes provide you with exact match results. Putting quotes around an entire phrase isn't what I intend sometimes, and quoting each term in the query is just excessive. In that case, I'll just click the extra link...


A timesaver that might be useful on the edge cases is to prefix terms with a plus sign. +the +quick +brown +dog is easier to type than "the" "quick" "brown" "dog", and should have the same results.


You're totally right, nice find.

Just happy that my current SE favourite does not seem to copy and to actually have their very own results: http://entireweb.com/#q=hiybbprqag


But the 3rd result (at the moment) is "Vanessa fox Nude"...


Well, I'm not seeing that result now, likely because the results are now filled with actual news about hiybbprqag. But if Vanessa Fox Nude was ranking then that likely means that some signal was associating hibbpraqag with Google and that either they're using (at least in part) a really old index or the crawler they're using doesn't follow redirects very well.

vanessafoxnude.com has been redirecting to my current site for several years now, but back when the original site was active, much of the incoming anchor text was related to Google and search.


It may not be illegal or 'cheating', but it's incredibly stupid for the same reason cheating is: Without the person you are cheating from, you can't pass the test!

In the case, the customers don't get relevant results unless other potential customers use the competition! In short, Bing's results are only good if Google is popular.

Why would you invest time relying on your competition? Shouldn't you be striving to match or beat them, rather than trying to piggy-back on them?


Isn't this just the McDonald's v Subway/Burger King example? McDonald's has the research and foot traffic. Rather than do your own, watch where the successful McDonald's go and then put your restaurant across the street.

If you have a bunch of users searching for "XYZ" on a different search engine and consistently going to link A -- wouldn't that imply it was relevant? You'd do the exact same thing for searches on your own search engine. The only difference is people have opted in to allowing you to have this info _implicitly_ by going to your search engine vs giving you this permission _explicitly_ by clicking through the EULA for the toolbar.


Indeed. Drucker somewhere makes the point that a business's key activities are innovation and marketing -- which implies you don't necessarily have to be innovative in your marketing, just good at it. Hmmm... I was just thinking that rather than do my own exhaustive search for startup companies to invest in, I'll just check who has gotten support from Y Combinator and offer them a deal, piggybacking off of Paul Graham's work. Wait, it's been done? Oh, never mind.

As a practical matter, I doubt customers will care so long as they've always had the option of turning off that part of IE's behavior. I mean, when did you last care about the authenticity of your phone directory's information?


That sounds all well and good, but in terms of cost vs. benefit, it's way easier to make a good application by stealing the years of hard work of the industry leaders than it is to reinvent everything and try to come up with your own clever tweaks to improve it. If you could do it without getting caught, it would practically be a no-brainer.


Yes but it is one thing if a small startup does it, and its another if a giant like Microsoft does it. In a way its applying a double standard, but i think Microsoft has enough money to invest is innovating in search space. Ultimately you can't really prove its illegal, so its a matter of ethics. Strangely we would all encourage a startup to copy what it can, so it can focus on the core innovation and not be bogged down, but we look down on Microsoft. Why is that?


Kind of indicative of where the internet is heading these days, a few big platform players which everyone else is piggybacking on. Like Windows Live Messenger attempting to stay relevant by combining facebook chat with it's own network.


I think your missing the goal. The goal is to sell more advertising not to get better searches.


"Is it illegal? Is it cheating? Is it unfair?" Who cares? Google already got everything it needed out of this situation: a gigantic PR win, and a morale boost for their own team. Well played.


I'm not sure how gigantic that PR win is, if I had to come to HN to hear about it.


Give it a day to percolate and then we'll see if I'm wrong. I obviously might be.


just after i read this i see an articel on this on the frontpage of financial times germany

http://www.ftd.de/it-medien/medien-internet/:suchmaschinen-k...

headline: googles trap for microsoft


give it time ... that headline sells itself. If Google pushes the issue, it should be on every night news report by tomorrow night.


I'm not sure Google wants to get pulled into a conversation about automated tracking of user search data.

PR "wins" can become PR nightmares in a blink.


Told ya!


Not a PR win for Google from my point of view. I'm a huge Google fanboy (daily user of the search engine, Gmail, Google Apps, Android, Google Voice, etc), but this whole situation is a PR stain on them for me.

Firstly because I think they originally misunderstood the manner in which Bing's results were being influenced by their own, and then secondly because if they are going to complain about Microsoft collecting information about their user's usage patterns -- well, that's really, really hypocritical coming from Google. Lastly because the whole thing smacks of high school level gossip. If Microsoft is really doing something out of line, handle it in some other way than engaging in a gossipy blog war.

To reiterate, I'm actually a Google fan, I'm OK with trading some privacy for useful services, but if they are going to bang on Microsoft for collecting user usage information, well that's about the worst case of the pot calling the kettle black I've ever heard of in the tech industry.


I'm happy for Google. Now can they get back to improving their search results? The changes they made last week (http://news.ycombinator.com/item?id=2152286) did help some technical searches, but many results are still being overwhelmed by SEO spammer crap.


Got any good examples? Looks much better to me.


I agree in principle, disagree in detail; this is a loss for Bing. Google isn't going to get much further benefit out of this but Bing is going to have egg on its face.

Discussing the hypotheticals of the situation as others are doing is interesting (serious), but irrelevant. The court of public opinion isn't going to care about that nuance and will find against Bing if this goes viral. All of the other defenses won't matter either, "everyone is doing this" and so on. Public opinion won't care.


I have to agree with you. After reading all comments in the thread it seems like the best bet for Google is to use this to make MS look bad. Using the cheating in school test analogy. If you get caught cheating, it is a blow to your reputation.


My take is this: the whole Google ethos is that they are trying to have the best algorithm to give the best results. Outside of this sting they have always been at pains to put forward the view that nothing is manually ranked.

I think the same thing applies to Bing here: if they have a generic algorithm that ranks results based on toolbar (or other data) it could be easy to see how their data is skewed by Google given the amount of traffic Google search gets compared to the rest of the internets. This seems fine to me.

But if their algorithm does stuff with activity on google.com because it is google.com then this is a pretty clear foul - it is both essentially copying, and the equivalent of manually ranking results (specifically, Google results)

The corollary of this is that if their algorithm is generic, then it will still work if Google were to cease to exist. If it's not generic, it would be useless without Google.


When asked by SearchEngineLand, Google's Singhal seems to imply Google Toolbar clicktrail data is never used for ranking, but his wording is actually a bit vague:

Absolutely not. The PageRank feature sends back URLs, but we’ve never used those URLs or data to put any results on Google’s results page. We do not do that, and we will not do that.

Put any results could be read narrowly as "this data isn't used to add pages to the index", or more generally as "this data isn't used to rank results relative to each other". Also, Singhal's pledge that "we will not do that" is much stronger than any statement I've ever seen in any Google privacy policy, which all pretty much say Google may use any info they have to improve their services.

Matt Cutts, can you clarify if Singhal in fact meant the 'narrow' or 'general' interpretation above?

And, if the 'general' meaning, then is there any statement about the use of clicktrail data in Google's published privacy policies that is as strong as Singhal's?


> It strongly suggests that Bing was copying Google’s results, by watching what some people do at Google via Internet Explorer.

Wow, it almost seems that is exactly what they are doing, which is some pretty dirty stuff. Now MS always had a shady track record, but I thought recently the company got a lot better.


You're crazy if you think every player in this space won't opt into "shady" if it means a better user experience. This is just a "shady" we get to hear about.


Like companies give a crap about "user experience" unless that means earning more and more money.

Piggybacking like this should really be copyright infringement or something, as there's nothing morally right about it.

On the other hand Google should be more quite about this, after all they've built their businesses database for Google Maps / Google Places by piggybacking third-party services like Yelp and TripAdvisor. And now all of a sudden when I'm searching for "restaurants" I have to scroll the page to get past Google's own crap.


The real question should be why is Google not doing this. Bing seems to be learning from what results users choose and improving their results.

Seems like a no brainer, unless i missed something.

I also really like this for some reason. It's very ... gangster. Shows that bing is scrappy and willing to bend the rules.

That being said, i will still continue using Google.


not sure why i'm being downvoted. if you don't agree leave a reply.

it's very discouraging


Your comment begun with +1. It is now at -1. So of all the people that looked at it, some modded it, and of all those people, there are two more people who downmodded it than upmodded it. Two.

Right now the post has +186. So assuming that half of those people read your comment, we have something like 93 people who read your comment, and there are just two more downmods than upmods.

Unfortunately, HN shows people a comment's score before they read it, which skews results. People aren't objective, they tend to upmod comments that are already upmodded and downmod questions that are already negative.

So if you are unlucky and the first one or two people to read your comment are constipated, you get a negative result for not fault of your own. Bad luck, try again.

Please don't be discouraged. Try to be helpful and constructive and to present a point of view that others may have missed in the conversation. The upmods will eventually follow. Cast your seeds and let a thousand flowers bloom.


Sort of petulant on Google's part to release this, no?

Of course your competitors are going to copy you. It's not innovative, and you might consider it 'cheating' if you forget that each and every one of us are building off of a foundation laid by other people. But it works, and that's why it happens and will continue to happen.


You are misunderstanding the word, 'Copy'. Here, it does not mean like "3-D movies is a great idea, lets copy it from avatar". Its more on the lines on "This code is great, lets copy it".

Copy == getting inspired by brilliance of an idea and implement on your own in the first case.

Copy == stealing in the latter case.

Copying (Ctrl C+ Ctrl V), google search results is theft, not "getting inspired" from previous body of works.

Edit - Made the PoV clearer.


No one's stealing Google's code.

Here, it means 'we have evidence, given to us by our users who agreed to share their web traffic with us, that showing this result for this query is a great idea -- so let's do that.'

People obtain data on their competitors' performance all the time and tailor their products accordingly. It's not theft, it's competitive intelligence.


> People obtain data on their competitors' performance all the time and tailor their products accordingly. It's not theft, it's competitive intelligence.

Yes, but in this case, its more like claiming competitors product (the search result), as your own in your product directly. I would consider it a theft.


No it's not. That's absurd. A clicked search result is a successful product. Bing is taking note of a competitors successful product and using that information in it's own decisions on the products (serps) it produces for it's users.

This is a bunch of microsoft haters making hay over nothing at all. Quit whining. It's not theft, it's not any more privacy-offensive than anything Google does, get over it.


I agree with you 100% if that's the case, ( and matt Cutts ' comment suggests it is) . My argument were for the case if Bing were to crawl google results page, (programming equivalent of Searching on google and feeding results on own db as results for the term ), not considering user clicked information.


When identical inputs create identical outputs using different algorithms, it's reverse engineering not copying.


Except the algorithms are not really different. one algorithm is just a wrapper over the second algorithm. the crux of the matter is really in just second algo.


It's not a wrapper. Bing isn't passing requests to the Google API and then returning the results on the Bing page. Keep in mind that 93/100 of the seeds Google injected into Bing's database were filtered out.


Or: "3-d movies are a great idea, lets take scenes from Avatar and put them in our movie" I don't think excessive simplification lends justice to the issue. Just like how piracy isn't necessarily theft, what Bing is doing isn't necessarily copying.


Do I have this right?

1. User does a search in a Microsoft toolbar, using Google as his search engine. User is searching for $terms.

2. User gets a results page. User clicks on the entry in the results for $site.

3. Toolbar sends back to Microsoft that the $site was the first result the user chose for $terms.

4. Bing uses this to increase $site's placement in searches for $terms.

An interesting question then would be whether or not Microsoft also "copies" from Bing? That is, if you are using Bing as your search engine, do they still use the fact that you went to $site after searching for $terms to adjust the rankings?


So in an effort to be as good as a competitor MS is watching what you do when you interact with that competitors website and sending that information home. Seems like a really big reason to suggest to anybody you know that they uninstall the Bing toolbar.


Google gathers lots of user data on 3rd party websites via services such as (to name a few): - Google analytics (opted in for data sharing) - Chrome - Google toolbar

@Matt Cutts - I'd love it if you could confirm exactly which user data you DO and DO NOT use to influence rankings. Or, at the very least say on record that you don't do what Bing are doing and use data from bing.com

Overall, I'm not surprised that Bing are doing this for some keywords - all the major search engines use a massive number of different signals. I'll be more surprised if it turns out this is happening at a large scale or for competitive terms.


Looks like Microsoft is "innovating" again...


A great deal of value has been added to this conversation by this comment going by the upvotes.


It's a little late for me to elaborate but here's what I meant: Microsoft has a long history of copying its competitors and calling it "innovation". Absent any other evidence, that tends to put the burden of proof on them.


It's instructive to think of the cases where Google can return a search result, even though the searched word doesn't appear on the page. Most often, this occurs because another site includes an outlink to the page, with the searched word. That is, they're 'copying' a publicly-available source that indicates that word is associated with that page.

I see this Microsoft tactic as similar. They're considering search terms that resulted in a visit to the page from other search engines as being important indicators of the page content. If they have that URL-to-URL-trail data legally, and the signal works well, and they are not singling out Google's URLs as the only source of such a signal, I'm not sure what the problem is.

Google didn't get where they are by throwing out legally-collected useful data, and Bing won't catch up to a leader who has clicktrail sensors everywhere, via analytics/toolbar/ads/mobile/etc., by throwing away legally-collected useful data.


TL;DR

1.Bing is inferring search results from user behavior, collected via Bing Toolbar

2. Google team makes an experiment: using Bing Toolbar to feed Bing particular behavior. Namely, they all go from a search result page on Google.com laden with a unique word to a particular target site.

3. Bing infers connection between the unique word and the target site.

4. Google cries cheating.


Wow, I'm surprised by all the developers on Microsofts side on this one. Google spends a lot of money developing proprietary algorithms for determining search results. Microsoft is then stepping in and taking advantage of the money Google spent by copying some of their results. It's rather like someone taking the results of a Consumer Reports list and publishing it themselves. It borders on illegal, and it's definitely shady.

But what I think is more important is all of the flak that Google has been catching for supposedly slipping in its quality of search results. If it's quality is so poor, then why is Bing stealing its results? It's a great method of striking back at the negative PR they've been receiving.


"If it's quality is so poor, then why is Bing stealing its results?"

Bing is apparently using toolbar click data (AFAICT it hasn't been shown that this is specifically targeted at Google or even at search engines in general) when it has no other information for the given search term. That has very little relation to the quality of Google's search in general.


Google copies Bing's layout for images. Bing copies Google's search results.


The funny thing is that people with the Bing Toolbar are still using Google to do web searches.


MS has a long history of stealing and sabotage. This doesn't surprise me one bit.


So what? Is it a scandal that Walmart and Target both send employees into each others stores and actively monitor prices on items? It's called being competitive, and to be competitive you have to at least match what your competitor is doing, then beat them.



It's interresting. A little bit like browser wars, isn't it? Browsers are really similar between themselves. If any new noteworthy feature appears in one, it is very likely to be copied to another, which is a very good thing for end users and is a reason for which competitiveness is good. At the end of the day, users want more-less the same functionality, no matter which browser they use. There are some differences in details and quality, but rather minor.

Both Bing and Google are targeted towards mass market and I think people expect the same from both. If Google does it right, there is nothing more to invent. And even if there is, it is probably pretty expensive. It is so much easier to copy than to invent from scratch, just to get something almost exactly the same as Google :)

I am really interrested in what could Bing do to be REALLY different or better than Google. And if they did, Google would most likely do something very similar :)


They should have made one of the search terms "Agloe, New York"! [1]

Footnotes

[1] - http://en.wikipedia.org/wiki/Fictitious_entry


"Is this illegal?"

IANAL, but in certain jurisdictions, most certainly yes. Many countries have copyright laws that protect compilations of things that are individually not worthy of copyright, for example telephone books. Copying down an individual telephone book entry is of course not a copyright violation, but copying the whole listing in a systematic fashion is.

I'd guess that this law applies to search engine rankings as well - rankings/listings of individual items that are not protected by copyright, but where a lot of effort goes into producing the listing itself.


IANALE, but in the US where both Google and Microsoft are headquartered, directory listings are not protected by copyright (though the book itself can be).


I assume that Microsoft displayed these search results in e.g. Germany as well, where directory listings do enjoy copyright protection. So if they wanted, they probably could sue.


A few quick thoughts:

- generally speaking, the conclusion seems to be that for regular queries, Bing uses mostly other clues to figure out relevance, so this is basically a storm in a cup of water. Regardless, since both Google's and Bing's algos are closed-source, we're going on faith when either company says data gathered from one of their products doesn't affect search quality.

- the whole thing about making a ranking overrider and talking about it publicly seems like a stupid move. Why in the world would you say you developed such code and then "deleted it" in an all-code-is-version-controlled-these-days world? This won't go very well against the claims that Google gives preferential treatment to its own services (e.g. email, maps) vs competitors.

- The experiment reportedly was triggered because Bing results were getting better for misspelled searches. But, seriously, returning wikipedia as the top result for something with low levenshtein distance to a rare word is not exactly rocket science...

- if Google feels that its SERPs are the most relevant possible, shouldn't it make sense that competitors trying to improve relevance will inevitably end up showing the same results as Google on at least a subset of queries?

- if you're saying Bing has just as good results as Google, regardless of the means to the goal, then how does publicizing that help the whole "Google's overrun by spam" meme going on?


<quote>But, seriously, returning wikipedia as the top result for something with low levenshtein distance to a rare word is not exactly rocket science...</quote>

Actually, that is Google's core business, and from the amount of revenue it's generated most likely harder than rocket science. The generation between keyword and website content/results is what a search engine is all about, and what Google does (arguably) well.

I believe why Google is crying foul is because it is the only reference to generate the mapping between the keywords they made up and the website results. Bing did not have these mappings until they evaluated user clickthroughs that went through Google's results, with their browsing history going something like : 1/ http://www.google.com/search?q=keyword 2/ http://website.com

Now Bing is using the users click history to generate the mapping from keyword<=>http://website.com ; this is the shady part: if google did not generate its results, that mapping would never have taken place: the user would never have been able to tell Bing that there is relevance between the two unless Google existed.


>> Actually, that is Google's core business, and from the amount of revenue it's generated most likely harder than rocket science.

You're talking about all of the work needed to make a search engine good, I'm talking about the specific algorithm needed for that particular type of query (rare, obscure, easily misspelled word). Different scopes.

And again, my observation in that bullet point is that, imho, the "torsoraphy" type of query could have been improved by something like the "close enough to rare word? + does wiki page exist?" algorithm, rather than copying.

Re: recording click history being shady: I don't really see what's so fundamentally different between that and recording surfing habits via ads. It goes back to the first point: Google could say they don't use that data to improve SERP relevance, but we're going on faith on that claim.


The interesting thing here is that Google now has the smarts and power to play games with Bing, and were I MS, that fact would scare me more than any lawsuit.


Sounds like an easy way to trick Bing into altering search results as well.


Doesn't the Google toolbar do essentially the same thing they are saying IE is doing?


This is addressed in the article. It says that google uses toolbar latency to determine page load latency and that is the only way toolbar data affects page ranking.


It is widely assumed in SEO circles that Google uses toolbar data, among other sources, for finding new URLs to crawl. They're very enthusiastic about getting all pages on the public Internet into the crawl set. User data gets them there faster and more reliably than a hypothetical competing crawler using only e.g. the observed link graph.

Edit: I did some digging to see if I could find an authoritative source on this, and found that Matt Cutts specifically denies this particular usage of user data for expanding the crawl set on his blog. Mea maxima culpa.

http://www.mattcutts.com/blog/toolbar-indexing-debunk-post/

Edit the second: An amusing note on this general subject: Google will fuzz test certain search forms on, e.g., high value government websites to get at the juicy data behind them which would not otherwise be reachable from just traversing the link graph.


I seriously doubt that is the only thing they use the information for, and unless someone can find somewhere in the google toolbar/chrome EULAs that specifically asserts what they will/won't do with the data, I'd assume the they use it for all sorts of things.


I don't think that's true - at least, I've not seen any evidence. There's a good chance the Google toolbar sends back browser data, but I think it's extraordinarily unlikely that Google snoops on what people are searching for on Bing, and what results they end up clicking on, and then adjusts their search results on the back of that.


If you enable the "show PR" thing on it, but that is (or at least was when I last used it) off by default. And Google specifically say they won't use the data for this sort of thing.


Google likens it to the digital equivalent of Bing leaning over during an exam and copying off of Google’s test.

Isn't that basic classroom solidarity?


Doesn't this make Bing practically useless? I mean I go to an alternative search engine to get alternative search results...


Sure, but if it's the default in whatever app/browser you're using, you might not bother switching to Google as long as it gets decent results.


I suspect that this isn't the only way they determine search results :-)

My guess is that they have a relevency metric from their own algorithm, but some results return poor relevency results -- then they may go to this set of secondary results and say, "Are there any results for this query that a higher percentage than expected users clicked on" and then add those results to the list.

IOW, this data probably isn't the common case.


I'm really curious as to how this is different than Bing using google's search results in some form of aggregate pageranking. If we assume that some arbitrary metric of "authenticity" exists for searches and a search for mbzrxpgjys results in results in a low (<0.1%) result for authenticity, but Google suddenly declares that www.page.com is the foremost authority in mbzrxpgjys's, it stands to reason that a good page-ranking scheme would take that into account and bump it to the front of the line.

I don't think it's cheating, no where in the article does it claim that they aren't doing their own search, they are just using Google's results as part of their own search algorithm. Is that really such a crime?


While the "cheating" angle on this seems hugely overblown, I do think that companies that harvest data through toolbars etc. should be obligated to explain upfront in clear language how they use the data. Not bury it in the legalese of a vast impenetrable ToS.


Give me a break, MS has always played the fast follower game which means they will ride on the work and investment done by the market leader and it's worked out well for them in other parts of their business.

Using signals from user behavior on the toolbar on ANY search engine seems to make a lot of sense when it comes to improving search results. MS employees are the biggest QA group for Bing. Internal tools allow employees to tag queries and results that are superior/inferior to Google. Both are displayed side by side and employees provide active feedback to help improve the algorithm and identify more systemic underlying ranking issues.


Instead of whining, I would have gone on the offensive. So we have a competitor copying our search results. Great. Now how can we fuck with that?

Figure out the requests coming from microsoft and return a different set of search results (e.g. XXX stuff) so that it doesn't show up for organic google resutls. Set the trap and once bing has incorporated those results for a keyterm, spam TC and LOL at Steve Ballmer gettingn worked up.

This discredits the relevancy of bing and all that PR dollars spent rebranding would have gone down the drain. Imagine searching for a harmless search term like 'poodle' and getting hardcore triple xxx results.

Oh well, dont do evil right?


> For the first time in its history, Google crafted one-time code that would allow it to manually rank a page for a certain term

If that's accurate, that's a precedent I'd rather not have seen.

(a little help on the grammar here, anyone?)


This is an ultimate opportunity for Google - Can't they somehow spoof the results that are sent back to Bing. I now if someone was cheating off me in an exam, I would try and give them the wrong answer.


Wait.. MS is copying the competition!?


I'm not sure about this. It almost sounds like Google is posturing. The reason I say this is that while Google was getting bombed up until last week with scraper sites, Bing wasn't.

If Bing was really copying results, they would have reflected the spam sites, because people click on those when they are highly ranked just as often as they click on the originator site. After all, the problem is that the content is identical.


Well, the article suggests that there is more to it than just mere copying. The way I see it, Bing is basically taking a look at what is that the user does when it searches for a term that is unknown to Bing. Once they have this initial data, they are able to filter it as they see fit to avoid getting spam results.


The allegation is that Bing is/was copying results for long-tail queries. The specific example being some very misspelled words. So in no way is the claim made that Google and Bing results are identical (even for most of those long tail results, given that it's only the top ranking pages).


"We learn from our customers. We use our customer's data." That sounds like an admission to me, and he is really dodging and spinning the question.


>Is It Illegal?

Suffice to say, Google’s pretty unhappy with the whole situation, which does raise a number of issues. For one, is what Bing seems to be doing illegal? Singhal was “hesitant” to say that since Google technically hasn’t lost anything. It still has its own results, even if it feels Bing is mimicking them.

Funny... that's the exact same argument software / music piracy often makes.


Google wasting time embarrassing Bing? I think this is more interesting that learning what Bing are doing.

Google are clearly monitoring Bing (and others) as a matter of course. I'm interested to know what they'd have done if they'd found Bing providing better quality results. Would they have spent resources trying to figure out what Bing were doing right, or would that be "copying" too?


What was that saying: "If you can't beat em, join em"?

disclaimer: this is not my acknowledgement that I agree with the practice.


The biggest article surprise for me was Google's claim they don't use the toolbar or Chrome directly to improve search queries. I assumed measuring bounce rates and patterns in link graph traversal across the entire web was part of their raison d'etre, as with Google Analytics


> When the experiment was ready, about 20 Google engineers were told to run the test queries from laptops at home

An interesting side-effect is that Bing has in its logs the home IPs of the Googlers involved in this research (i.e., anyone who searched for "hiybbprqag" in Dec. '10).


Since when has reverse engineering been cheating? If the article is correct, there still is no allegation that Google's algorithms have been used. I don't think Google is in much of any position to cry foul over any company using data mining to tailor search results.


Buried halfway through the article:

"These searches returned no matches on Google or Bing — or a tiny number of poor quality matches, in a few cases — before the experiment went live. [...] Only a small number of the test searches produced this result, about 7 to 9 (depending on when exactly Google checked) out of the 100. Google says it doesn’t know why they didn’t all work, [...]"

The writer apparently thinks these results justify concluding the article with this takeaway:

"When Bing launched in 2009, the joke was that Bing stood for either “Because It’s Not Google” or “But It’s Not Google.” Mining Google’s searches makes me wonder if the joke should change to “Bing Is Now Google.”

okayyyy.


If Bing is really copying what about those who fought Bing vs Google war as if it were Vi vs Emacs war?

By any chance, is Bing named after Chandler Bing?

"DuckDuckGo" has become by default. Its awesome.


also, did anyone took a look at Windows XP source code? I suspect there might be linux kernel in there.


Perhaps Bing tracks clicks for every search engine, not only Google. If so, they are not copying Google, but legally tracking user behavior across the web.


Did anyone stop to think that Bing is leveraging IE to improve/influence search results regardless of the URLs this monitored traffic is gleaned from?

Google states 9 of 100 planted queries showed up on Bing. You think Amazon, Godaddy, and AOL could make similar claims?

Probably...but those examples aren't worried about their market share evaporating.


With Google results filled with Spam & I hardly find a result on the first search, any sane-minded search shouldn't be even replicating (if they are).


In other news, Google has finally figured out why their search result quality has been steadily declining but won't comment on the specifics.


Google are really getting pathetic.


In the spirit of sensational, linkbait headlines, here's proof that Microsoft is copying Google's results: bit.ly/hTNYCW


Microsoft has long said they have an embrace and extend policy. I guess this is just taking it to a higher level.


bitchez x( [Down voters , u r welcome :)]


The balls on MS people are fucking amazing: "Harry Shum, VP of search development at Bing, responded by admitting that Google had uncovered a new form of search fraud, and said he wished Google had spoken to Microsoft about it before taking it to the press". So bing is either (a) scraping all web behavior out of ie, or (b) scraping G's search engine results, or (c) both -- and dude is pissy because G didn't give them time to get their lies together in private? Amazing.

ps -- there's a word for what MS's software appears to be doing: spyware.



TL;DR

If Bing can't find it for you, it will google it for you.


Inaccurate. “If certain users have Googled obscure keywords, results may subsequently appear in Bing for that keyword” is all the experiments showed. Bing is NOT fetching Google queries on its own.


uarhhh, he has taken my lollipop


It pains me to say it but... that's genius.


Andy Baio says it, it's ok. I say it and I get down voted.


Search for "hacker news" on both. The results are quite different. One might argue Bing is better because they don't have a duplicate result at the top.


That's not really relevant - the article says that by no means are search results identical for every search.


'hacker news' is an overly generic word... Bing only 'cheats' when it doesn't know.


Bing: powered by Google! Bing: now with Google!

(suggested taglines)


Bing Is Now Google!


Google scrapes websites to determine search results. Bing scrapes websites (Google) to determine search results. What's the problem?


Look at it like a math problem... Google takes all numbers and figures out the answer itself. Bing writes down the answer from Google's paper. They are hardly the same. Your comment is somewhat like saying "Hey, we're all writing down numbers"


This is not a good analogy. An academic environment has different strictures and norms than a business environment.

In a business environment, the relevant questions are:

1) does it break any laws? 2) is it profitable? 2b) is it consistent with the image MS wishes to project to its customers and/or regulatory overseers

One person's unfair competition is another's brilliant hack.


My comment wasn't meant to reference whether it was wrong or right, just the logistics of what is happening.


Think of it this way: Your professor asks you for the name of the imaginary friend that the professor's son always talks about. You have no idea what the answer could possibly be, so in your head you choose "John," a reasonably common name with a very low probability of being right. Google answers with a strong conviction in it's voice that the answer is "Mark," another common name. You have no evidence to believe that your own guess is correct, so in the face of the appearance of belief on Google's behalf, aren't you the least bit tempted to say that it's Mark and not John?


Customers care about results, not about how hard you worked to get those results.

This is a fundamental difference between student reward-schedules and rest-of-life reward schedules.

There's no credit for effort, unless you can spin it that way in your marketing / branding.


Horrible analogy. This isn't a test. This is a product.


Actually it is a test, and how well you answer it is directly responsible to how your business will or won't succeed.

Google has worked out their algorithms for processing the incoming data and generating an answer. Bing has apparently used Google's answer as a comparison to whether they are getting it right or not and when not, subbing in the other answer.

While in business this may not be illegal, is it still very much 'Not knowing the correct answer to the test'.


No, it's a product. Come on. Every product is a "test" of the market so in that one definition of "test", yes. But not in an academic sense, which is what you meant. Because only in an academic sense does the concept of "cheating" exist.


Look at it like a machine learning problem. If you slavishly match the test set you are doing something called overfitting. Your performance on the rest of the web will decrease, because you inherit bias from the test set.


If you think crawling is all there is to search engines, I have wget^H^H^H^H a search engine infrastructure to sell you.


I personally don't mind if one is copying the search of another. The whole idea is to get the BEST search results possible. And thats what I use a search engine for is getting by far the best results possible. I don't mind how they do it and as far as I can tell they aren't breaking any copyright laws...


In any case, if Bing is indeed copying Google's results, it is utterly unethical even if users are receiving accurate search results. As mentioned in the article, what Bing is doing is analogous to copying Google's "exam," which by any reasonable standard is wrong, even if a "proctor" didn't specifically state this rule.

Perhaps Google has patented a few of its search algorithm components.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: