Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft’s Bing uses Google search results—and denies it (googleblog.blogspot.com)
511 points by atularora on Feb 1, 2011 | hide | past | favorite | 196 comments



Setting aside the ethical questions, because I don't really care, when I look at the probable outcomes of this, I think it's wise for Google to point out what's going on here. This string of stories positions Google as the smart, sciency search engine, and Bing as a collection of hacks. This is how I'd want the public to perceive the battle if I were Google.

But even though this makes Google look good, PR-wise, Bing should still use this trick, if it makes their search results better. It seems like a short term solution, but a good one to get their results more competitive, while they work on the core problems Google has already solved. Google should call them on it and expose their hackery, so people know where the good search science still comes from, but Bing should still do it. They are both playing the game very rationally.

As an aside, I don't buy the arguments of "they shouldn't be mentioning Bing." This isn't like the POTUS running against some no-name congressman - this battle is already well-publicized, via hundreds of millions of dollars of ad buys by Microsoft, so the general public already knows there is a competition between Bing and Google.


Especially in the case of spelling correction, it is not so much that Google has "solved hard problems" to get the long-tail right, as that they have a monopoly on the relevant data. Unlike all their competitors, Google has 12 years of the entire history of queries and clicks to mine for signals about how to rewrite queries. Even if you have all of Google's algorithms, it is technologically impossible to build a better query rewriter, because you don't have their data. You can't buy this data short of buying Google, and if you believe Peter Norvig, it's an irreplaceable component of Google's quality advantage.

Microsoft, and any other would-be competitor, would essentially be committing suicide not to try to make up this data gap. If their toolbar is opt-in on the part of users, and you agree with me that my click history is mine to share with Microsoft if I so choose, this is helping consumers. Without some of this data, building a viable competitor to Google is impossible, and consumers do benefit from competition in web search.

Disclaimer: I work in Facebook search. Not the same thing as web search, and I don't really care whether Bing or Google "wins", though I'm temporarily rooting for Bing because as a user I want better, more competitive web search.


> Google has 12 years of the entire history of queries and clicks to mine for signals about how to rewrite queries

How about Yahoo ? They have a longer history I suppose. IMO it's all about quality and engineering.

More importantly why did Google survived and flourished for 12 years ?


They have a longer history, but waaaay less traffic. Also, their early business model was a paid directory, not organic search results.


They have waaay less traffic only since Google came onto the scene and stole their traffic by virtue of being better.


Exactly. Peter Novig wrote a awesome spell correction algorithm.

But now Google doesnt really need ( Possibly then dont even use it) because of two set of queries where first user mispells and second where he corrects, gives them a StringMap that they can use to map mispelled queries to correct queries.


Exactly... to beat Google, you need time and data. Or you can do what MS did to fill the gap for a while.


I'm pretty sure microsoft had a search engine before Google existed...


Actually, time won't do it either. Unless you somehow compete with Google today, the data will never come.

Google only gets the query volume it does because it is the quality leader. The query volume itself helps Google to retain its quality lead. Google likes to portray search quality as being algorithm-driven, and it is to some extent, but in the modern era quality is also about collaborative filtering with clicks. If you don't have the users, you don't see the clicks, and you can't have the quality. Web search is a natural winner-take-all monopoly, unless someone gets creative, which is what Microsoft seems to have done.


What volume do you really need, to get enough data to learn from? I'd think that 1% of Google traffic would still be a pretty big firehose to feed whatever learning algorithm you need to feed.

Don't Google, Facebook, et al run a lot of experiments for new projects on a subset of users/queries that's far smaller than 1% of traffic, and still yields very useful results?


In the case of spelling correction and query expansion, every little bit helps. Suppose you want to learn that people typing [mazad] mean [mazda]. (This is kind of a silly example, as dictionary- and edit-distance-based techniques can do corrections like this. So bear with me.) The event you need to catch is:

1. User mistakenly types a query [mazad], meaning [mazda]. (Probably less than 1% of total queries for Mazda, which is an infinitesimally tiny fraction of the total queries in your system.)

2. The user gets garbage results, and the user realizes their mistake and fixes it, rather than giving up in frustration. This is probably rather rare too, though

3. The user clicks through something that ranked highly for Mazda, and stays there long enough that your system thinks it is a "long click" that probably satisfied the user.

The golden datum here is literally a one in very-many-thousands-of-sessions event, and you need to catch a statistically meaningful number of them for every misspelling (or synonym, or whatever you're trying to learn from this data) you'd like to have your system learn. To have good coverage of the English language, we're talking about many billions of search sessions.

A previous commenter pointed out that Yahoo! probably has enough data; I bet they're right. I don't know if Yahoo! and Bing's technology partnership included access to such data.


      collaborative filtering with clicks ... you don't see the clicks, 
      and you can't have the quality
So why build just a "me too" product instead of finding a way to make "collaborative filtering with clicks" irrelevant?

This is so typical of Microsoft btw.


I mostly agree with you, but I'm not sure the next frontier in web search is spelling correction. I feel like there have to be ways of producing better search results that work better than upranking the ones people click on.


Yes, but historically this is how Microsoft wins.

They win by entering the dialogue. They copied UI from Apple back in the day, and then were able to cut prices, etc., and Apple lost (at least for a decade).

I would caution Google to ignore the controversy completely and move on to more interesting problems in search. Microsoft is very good at playing this game increasingly on their terms, and then they win.

ETA. As a quick caveat. I have nothing but respect for many Microsoft employees. And I don't mean to generalize. I just can't help but remember the similarities to:

http://en.wikipedia.org/wiki/Apple_Computer,_Inc._v._Microso...

Being 'right' is sometimes a premature optimization....

ETA2. Esp. with someone who is very good at making the second dance move. But think back on all the major Microsoft products (IE v. Netscape, Office v. Lotus, though this is a reach maybe, Xbox v. PS, .NET v. Java, etc.). It's just a different approach. But it's a mistake to not analyze it and appreciate it for what it is..


Wait a second. Can't let you completely get away with the MS stealing UI from Apple comment when SJobs stole it from Xerox/PARC first. Although to be fair he does readily admit the thievery.


No "thievery" was involved, he licensed it in exchange for access to pre-IPO Apple stock.


Hmm,

How would you explain Xerox suing Apple over this?

http://www.nytimes.com/1989/12/15/business/company-news-xero...

I can't find a mention of Apple licensing anything in the wikipedia article either:

http://en.wikipedia.org/wiki/Apple_Computer,_Inc._v._Microso...

Also this claims to debunk the story: http://books.google.pl/books?id=gzAEAAAAMBAJ&lpg=PA3&...

"Xerox did at one time owne stock in Apple, they were purchased as an investment"

(All this followed from the Wikipedia page, I'll admit)


Hmm, it looks like you're right—it's just a legend. Thanks, I stand corrected.


I didn't mean thievery in the classic sense. Only in the sense that he/AAPL didn't come up with the innovation of the GUI as the parent was suggesting.


Yes but when Microsoft took on Apple back in the day, they were young, agile and relevant. The same is much harder to say of the MS of today. They aren't winning very often anymore, especially in the consumer space. And I say this as a Microsoft alum and really enjoyed my time at the company.


> Yes, but historically this is how Microsoft wins.

What is Google made up of? The then MSFTs are now at Google.


This isn't a case of "copying" though. That would entail reverse engineering an existing product, which Bing is (according to this post) not doing. They are instead importing the search results themselves from Google.com.


Sciency? Their experiment appears to lack a control of any kind. They only use a Google search page as the source for the links, one that is specially engineered to return specific results for a specific term. Then they count the coincidences to support the conclusion that Bing is stealing search results from Google. Where is the non-Google page of links? Where is the comparison using links from a specially crafted bookmarks file? What is the margin of error? Where are the plausible alternative explanations? If this had been a science experiment, it would have been scoffed at for its narrow scope and predetermined outcome.


He meant sciency in the marketing sense. You can market science and not actually be scientific.


>Where is the non-Google page of links?

There are no non-Google page of links. The only association between these terms and the page exists on Google.

FTA:

>To be clear, the synthetic query had no relationship with the inserted result we chose -- the query didn’t appear on the webpage, and there were no links to the webpage with that query phrase. In other words, there was absolutely no reason for any search engine to return that webpage for that synthetic query.


Are you kidding? You are a moron - this is as close to a smoking gun as you can get. You did get the part that there was NO relationship between the meaningless search phrases and the returned page right? What POSSIBLE situation could explain Bing returning the Google returned page without recourse to Google's search?


Google created the relationship when they enabled user tracking features in IE and Bing Toolbar, then used a specially created Google page as the HTTP REFERER to the desired page. Go ahead and duplicate the experiment using a non-Google page or file. I'll bet Bing "steals" those results, too. Google seeded Bing, plain and simple.

Note also that the example search links in the blog post are all from the non-SSL Google search. User tracking doesn't necessarily rely on the HTTP REFERER in this case, since the browser already has access to all of the necessary information. But it would still be interesting to see the experiment repeated using the SSL-encrypted Google search, which disables referrer information, along with non-Google sources.


Do you really go around calling people morons? Is it just on the internet, or do you do it in real life where you're not anonymous too?

If it's the first case, then that's a pretty cowardly way to act. If it's the latter, then you should just be ashamed of yourself in general.

Calling names is not OK. I'm pretty shocked you got any up-votes for your comment at all, and I'm ashamed of everyone on the site who gave you the bump.

EDIT: AT time of posting this comment, he had +8. My faith has been restored in the HN community.


There's no clear indication that the clickthrough analysis is specifically targetting Google, vs. targeting any search engine or even targeting any page with that text in the URL query string, etc. Maybe it is, but they haven't shown it.


While I don't really agree with Google here, I don't think that distinction particularly matters. If you buy the premise that they're "stealing" from Google then it equally applies to everyone.


Fair enough, though I don't think that's the only POV - you could analogize it to ripping off an existing song wholesale vs. making some kind of collage of short samples from 10 existing songs.


>Are you kidding? You are a moron - this is as close to a smoking gun as you can get. You did get the part that there was NO relationship between the meaningless search phrases and the returned page right? What POSSIBLE situation could explain Bing returning the Google returned page without recourse to Google's search

Eh? Then why did it happen only in 6% or 7% of the cases tested and not close to 100%, a fact that the blog conveniently glosses over?


when I look at the probable outcomes of this...

In discussions of this, I'm always surprised that people forget that Microsoft has had the axe of anti-trust litigation looming over it's head for years. They haven't been able to compete aggressively on a number of fronts, because they could easily be seen as anti-competitive, and end back in the courts.

But, now, search is a market where they are clearly the underdog. And, Microsoft finally has an arena they can step into and fight bare bloody knuckled without arousing anti-trust eyebrows in the US or EU governments.

And, Bing is competing rather well. Bing launched 18 months ago, and they now have 30% of the total search market (Yes, the bought a good bit of that through Yahoo). And, they are finally presenting a viable competitor to Google in general web search, which I believe is good for all.

Not only that, but in a number of search verticals, IMHO Bing's technology is quite a bit better than Google's (image search, video search and travel search).

I think Google has a reason to be scared and start throwing punches. Microsoft has a much more diverse revenue stream, and they can afford to lose money on search for years without breaking a sweat. Google primarily has a single source of revenue: search advertising. It'll be interesting to see how Google reacts when it's back is against the wall fighting for it's only revenue stream.

I agree with the OP. This is going to be a fun fight to watch.


Playing PR games (even if you win) may make you look good at the moment but has a tendency to spiral into more PR games instead of a better product.

Also, as stated on the earlier HN thread on this topic, it isn't at all clear that Google will come out on top. Many users suggested that the Google toolbar collected data on the sites people visited and if the Bing toolbar does also -- well is that really a game Google wants to play? Bing could come out a net winner given the increased attention on their search engine.

Why does this last aspect matter? It is an acknowledgment that Google is taking their competition seriously, which indicates a bit of uncertainty with respect to their own product. The more that goes into this (esp. if Bing hits back w/ info on how Google historically has mined data with their own toolbar) the more that Google has to lose and Microsoft has to win.


As an aside, I don't buy the arguments of "they shouldn't be mentioning Bing."

I do buy that argument but only for mass media PR type stuff. The market leader can only give attention and raise awareness to the followers.

The thing is in this case, it's a technical blog posting and really anyone following that already knows about Bing. It was in searchengineland. If they had a link to it on their home page it would be a different story.


The result of this PR move does not decrease my opinion of Microsoft as a company, or Bing as a product. I held them both in fairly low regard already.

The result of all this is that I now perceive Bing to be considered more of a contender by Google than I previously did.

I doubt that is what Google intended. But then, this PR move was probably meant to change the opinion of people who had a higher opinion of Microsoft and Bing than myself, or were just casual users with no real position.

Either way, it does put Microsoft on the defense for something that I wouldn't construe as malicious.


Google should proceed with caution, do they really want to get dragged into a debate about tracking user actions to influence search results?


Agreed, this is truly shocking behavior on their part.

1) They do the same thing. When I have the google toolbar installed and use bing, my clickstream data goes to Google.

2) On a micro level, there is nothing wrong with this at all.

Of course, on a macro level, both sites trample all over user privacy, so you should be using DDG.


To the defenders of Bing - would it be ok if Bing similarly used visits to amazon.com and various URL parameters to reimplement their "bestselling" lists for each category? I'm sure they could, and it seems clearly unethical. They are doing the same to google, right?


you still didn't get it, they are using clickstream from every possible source from amazon to ebay to google. and there is nothing unethical about it, as it just gives them annonymous data about which site people are clicking to. Google gets the same data from google toolbar and other products.


I think there's a very clear ethical divide.

Using any Google product, you implicitly agree to them, in exchange, using your data. That's how Google works, and has always worked. That's your payment for using the service.

With Windows - the operating system that you are using - it's an entirely different proposition. For one, you've already paid for it. And secondly, you don't expect the software that you bought to spy on you and give away links you were clicking on in a Google search results page.

Links in a Bing search - sure! That's how search engines work. But tracking my clicks on any other web page, by my OS, that's spyware, plain and simple.


Google clearly stated in their blog post they used IE8 and the Bing toolbar with settings that provide user experience data to MS (for the Suggested Sites feature and such). I'm sure Google's toolbar and Chrome phone home if you let them, too.

Unless I'm missing something just running Windows isn't enough for MS to do the kind of data collection Google is claiming.


You fail to understand the issue here. None of this stuff is on by default in Windows. You have to install the Bing toolbar.


I doubt the debate would get this nuanced, but I believe Google only tracks users on their own site. It appears Microsoft is tracking click across at least one search engine.

I think this distinction has legal as well as technical implications.


This is factually incorrect. The google toolbar records clickstream data across all sites*

* all with some specific exceptions for perhaps SSL and locally-resolved zones, or something like that.


Fair enough, but the ethical thing is for Google to exclude bing.com from the clickstream analysis, and I hope they have.

The big search engines are large enough to show up clearly in any of their competitors statistics (and there are always human teams monitoring the automated process) and should specifically exclude data from their competitors.

For Google to use Bing clickstream or vice versa also perpetuates a vicious circle where bad results from one search engines will spread among the others.

Bing either hasn't acted ethically, or hasn't though through the consequences of absorbing clickstreams of competitors' sites.


Yes. And Bing toolbar does just the same. And when those clickstreams are used as one signal of link relevancy, you get the exact behaviour they are describing on the google blog.


Thanks for pointing that out. Much Appreciated!


This whole accusation by Google was very childish... and I thought adult supervision was no longer required at Google ;)


Would a lawsuit be better? I think that this is a better problem to be sorted outside of the courts. Too often we rely on courts to solve our problems.


The holier-than-thou attitude by Google here dumbfounds me. Android phones and tablets would not exist in remotely the shape they do today but for the innovations of Apple. They organized with Apple's competitors to provide an offering extremely similar in spirit and often in form to what the iDevices do. There is zero chance Android adopts all the "conventions" it has without copying Apple. The world was not on a fast track to full phone, bright screen, touch capacitive displays and gestures and app markets before Apple pioneered them. iPhone was not the logical inevitable implication of the technology that had gone before. If Google thinks Bing is not playing fair, Google has 10x as much to answer for in real damages to Apple for thieving their innovation.

Some would say, "Well, Android has innovated on top of iPhone's precedents." So has Bing, right? In fact, I'd claim Android owes far more to Apple than Bing does to Google.

Some would say of Android copying iPhone, "Well, it's fair because we want competition in the mobile space, not for one company to dominate." Sort of like how Google dominates search? How much would I love for a true competitor to Google, so we can test, e.g., their policy of having terrible customer support.


Android existed back into 2003, well before iPhone was available or even announced: http://www.businessweek.com/technology/content/aug2005/tc200...

Which is not to say that Apple hasn't contributed ideas to Android, but just an indication that some people were thinking about good mobile devices before Jobs came out with the iPhone (which, by the way, is a bright, colorful, phone-calling PDA, not all that different in principle from late Palm Pilots/Treos).


That last part is key here. Almost everything that the iPhone does existed before the iPhone. You can find most of the UI concepts throughout the old Palm app eco-system. Apple did a great job of bringing it all together and (most importantly) bringing a consistent UI metaphor to it all.

What Apple did was revolutionary, but they stood on a lot of shoulders to do it.


The relevant analogy to phones would be if the Windows Phone team took all of Android's source code--and somehow decompiled the proprietary bits--changed the branding, and distributed it as their product.

There's a huge difference between doing work that is influenced by someone else and just stealing someone else's work.


The degree to which Microsoft is copying Google is to a much higher degree. Google did not do an exact "copy/paste" like Microsoft did here. At least Google did it from scratch.


Google started with something, probably by scanning the Yahoo directory and starting from there. If they just pointed Googlebot to one site they would never get the whole web, they had to start from some source with a lot of sites that they could then branch out from.


And depending on the state of Yahoo's (or Lycos' or Altavista's) robots.txt at the time, that may be defendable.


There's an "all's fair in love and war" point you have here that carries some water, but there's a difference between reimplementing ideas and literally reading the output of your competitor and using that in your algorithm.


Here's an alternative hypothesis: the bing toolbar might look for explicit search queries (either strings entered into a textbox, or q=, query= parameters), and navigation from such pages to external domains. This would match all "search engines" in the most relaxed meaning of the term: product search, thesaurus, lexicons, dictionaries, everything; and I'd argue to be a legit signal for a "general search engine" to match.

(Legit sidenote: Google has, via the use of Analytics data, a mass coverage of clickstream for the whole web, which are default opt-in, follows you everywhere, and can identify you uniquely. The Bing Toolbar at least asks first.)

If this is the case, Google isn't being picked upon; rather, they are merely the first, who figured this out externally. Cookie for the scientific rigor, but no cigar for the way they PRd the story. Correlation, after all, does not equal causation.


"Legit sidenote: Google has, via the use of Analytics data, a mass coverage of clickstream for the whole web, which are default opt-in, follows you everywhere, and can identify you uniquely. The Bing Toolbar at least asks first."

Google does not use Google Analytics data in any way in our rankings. I've said that plenty of times before, but it's worth mentioning.


Honest question: why not? Surely identifying sites that have disproportionate organic traffic relative to search engine referrals can only be good in identifying places people actually want to visit online?

As a webmaster I would opt-in for this sort of thing in a heartbeat if I thought it would help your algorithms understand my site. I'm sure Joel Spolsky and most other legitimate online publishers would do so too.


Google can already calculate this ratio of organic traffic to search engine traffic using the Google Toolbar stats - no need for Google Analytics.


It will result in a positive feedback loop.


Though Analytics data IS used for things like Double Click Ad Planner.


However, Google does use Google Toolbar data for rankings.


Well, while doing this they clearly knew that Google is by far the biggest search engine in the world, and naturally most of the data would be coming directly from Google. Right?


Why does that matter? They're only "stealing" results from Google insofar as people (/Google employees) used Google in order to illustrate what pages they want returned given a certain search query. I have a sneaking suspicion that the exact same thing would have happened if they were in the position to do the same query bombing on Ask Jeeves or crappywebsearch.net as it did for Google.

Search engines to me are an obvious case of a means to an end. If a search engine better than Google were to come out tomorrow I would switch to it (from Google) instantly with no regrets. Google's sense of propriety about their results (or, more accurately, what users clicked on after searching via Google), especially given the fact that they are well-known for their penchant for sucking in user data like a black hole (not that I care-- I want them to use it if it means better searches), to me seems 9 parts hypocritical and 1 part prima donna.

Need people be reminded that this is the same company that "accidentally" logged users' WiFi browsing habits while driving StreetView cars around Europe? Give me a break. Everyone is guilty, and no one is going to do anything differently now than they did before.


It matters because if you use your main competitors results and ranking to change your results and ranking, you can no longer claim you are original, innnovative or better. It just taints everything you did.


They didn't use either. Certainly not the ranking. Also, you're obviously better if you do-- in fact, you're better by definition. :) (innovative and original, perhaps not so much)

Why does the user's click from the results page suddenly belong to Google (apart from the fact that in this specific case they actually artificially created a fake long-tail result)? If I Google Bing, and then Bing's ranking of Bing goes up a a result (not that it's not already #1, but whatever), can you actually say that it's Google's result and ranking? What if it's nytimes, or any number of extraordinarily common searches where you're really just doing a domain lookup for a name you already know?

What if I didn't click on anything until the 30th page of results because that was the only useful result, and it causes the Bing rank to go higher? Does Google have any ownership over the rank then, even if the useful page was ranked lower than much more useful results? Couldn't Google then just return a list of every page on the internet in response to every query and then claim that their results are being stolen?

To be honest, I'm not really convinced that either side is in the right here. I just think that it should be made clear that there is a large distinction between stealing results and tracking clickthrough behavior. One would be laughably shortsided and of dubious ethics, the other is basically common practice, and is being made a bit more than it is because of its superficial appearance.


They definitely used results from Google.

They use results for terms users entered to Google to crawl pages that are not in their index (torsorophy example which is not an artificial one) therefore enriching their index based on google's results, incresing their depth.

As for ranking, it is more blurry, but When you record users clicks, which directly correlates with ranking, it starts stinking.


Google use Google Toolbar to record users clicks, which directly correlates with ranking.


This, in fact, could be quite wrong. Consider the fact, that most modern websites include a search engine of one form, or another; and the usage graph of the web has an extremely long tail.

Even considering the top 100 most visited websites on Alexa: all of them has a search form, and only 20 or so belongs to Google; it's very easy to see how the aggregated usage of the other 80 could be much, much higher, than the aggregated usage of Google properties.

Therefore, while Google might be single most impacted organization in the world, most of the data comes from non-google properties. And none of this has anything to do with my original argument of the algorithm itself being benign.


I'm guessing they're only looking at the Referrer field, and they're doing it across the board. (For me, Google's result pages all point back to Google, so they can leak this info via the Referrer field.)

I'd also guess that the data from domain specific sites are more valuable than generic search sites. (User selects appropriate site, does search, selects appropriate result.)


From http://searchengineland.com/google-bing-is-cheating-copying-...

>Suffice to say, Google’s pretty unhappy with the whole situation, which does raise a number of issues. For one, is what Bing seems to be doing illegal? Singhal was “hesitant” to say that since Google technically hasn’t lost anything. It still has its own results, even if it feels Bing is mimicking them

This is actually just IE's "spying" working properly. If an MSIE user that has allowed Microsoft to see their browsing habits follows a link after a search then MS are associating that link. This is sensible as it's measuring actual visits following a given search.

If someone searches for a googlewhack and Bing have no results for that term then it's natural that MS would then use this data to associate the googlewhack with the visited page.

Initially I thought this sounded like MS being underhand but really they're tracking their users and associating their users search terms with the pages that they visit - _not_ using this data for search (given they have permission) would be silly, no?

The flag this waves for me is how easy is it to manipulate Bing results using false MSIE reports back to MS, anyone know of botnets sending fake data to boost page rankings??


>The flag this waves for me is how easy is it to manipulate Bing results using false MSIE reports back to MS, anyone know of botnets sending fake data to boost page rankings??

You sir, have won the thread.


The number-one point I take away from this isn't about ethics or what is "right".

It's that Microsoft has no confidence in Bing. They aren't willing to trust their algorithms to produce the best search results. They've decided that, some portion of the time, the single best search result they can return is whatever Google is returning.

They've given up on trying to be better than Google, and are settling for being a cheap, off-brand knockoff that rebrands stale Google search results.

That's rather shocking, and I frankly thought the Bing team were better than that.


Disclosure: I'm interning in the Bing team at MS but my statements here are mine only and do not reflect that of Microsoft.

I don't see how this indicates that they've given up on Bing. They're spending a lot of money on the online services team, even making a loss for the past several quarters to improve Bing. They're playing the catchup game, so this is a quick and easy way to stay competitive while they get their algorithms up to scratch. It's better than the alternative of loosing all their customers/market share and then have no data to help them improve.

I see this more as a compliment to Google, even though Google certainly doesn't see it that way, and I can definitely understand their frustration.


"[Bing is] playing the catchup game, so this is a quick and easy way to stay competitive"

Since you're interning in the Bing team, I'll ask: did a lot of people at Bing know about Google's rankings as a data source in Bing? I'll understand if you can't answer, but I'm genuinely curious.


They've decided that, some portion of the time, the single best search result they can return is whatever Google is returning.

Sounds like a very rational position. Anything less could be termed 'delusion' on Microsoft's part. And for the longest time, they were delusion about the search market in general. No longer.

They've given up on trying to be better than Google, and are settling for being a cheap, off-brand knockoff that rebrands stale Google search results.

Google being an additional source of data for them does not equal Bing "giving up on bettering Google".


However, you really can't blame Google for pointing out the fact that when you're using Bing, some portion of the time, you're in fact getting stale Google search results.


Agreed.


This is not correct; the algorithms use to produce search results rely on user input over time. The proper rewording is "They've decided that, some portionof the time, the single best searchresults they can returns is whatever users are using.


If you have no information about a URL, algorithms aren't going to help you. In that case, falling through to what your competitor has indexed seems like a pretty smart idea. I don't really understand all the hand-wringing.


Maybe the explanation is simpler than that. Perhaps they do trust their own algorithms, but their algorithms contain vulnerabilities that are easy to exploit. Maybe Google figured out how to use this weakness to manipulate Bing's results then spin it to make it look like Bing is stealing from Google. Google certainly knows many of the pitfalls a startup search engine will face. And anyone who's ever had to decipher the madness of msnbot in web logs over the years can tell you the Microsoft search engine team has made some monumental blunders that weren't intentionally malicious. In fact, Google may have just helped Bing by exposing a weakness that they can now patch.


Will any Bing users suddenly switch to Google because of this? Probably not. Will people who've never heard of Bing be reading about them in the paper tomorrow? Yes. Will complaining about competition from underdogs make Google look bad to some? Yes.

I don't see how making this a public issue is a win for Google. Seems like something they should have kept in their back pocket. "Keep your enemies closer", as they say.


Will this cause people who haven't heard of Bing to know about it? Yes. Will the headline "Google claims Bing is copying its search results" leave a good first impression on those users? Probably not.

Whether or not people perceive the claim as accurate or complaining or whatever, first impressions are still powerful. If the first time you hear of Bing is the accusation that they're copying someone else's results (true or not), that's probably not a first impression you would want people to be left with if you were Microsoft. Chances are, people are going to read this as "My trusted search provider accuses unknown search provider created by the same people as Windows Vista of stealing its results".


I really doubt there is a person out there who will read about this tomorrow and has not heard of Bing, considering how much advertising Bing still does. You also have to consider how any CNN type article of this post will read; the points that Microsoft lied publicly and "stole" results from their competitor, just to keep up, will stand out. There has been enough of a negative buzz for Google about how SEO-ized their results are, so it certainly seems like they have reason to bash the quality of their competitor right now.


If the privacy bit is properly reported in MSM, it should definitely raise some concerns. Think corporations with sensible data and all that.


The thing is, Google monitors which search results you click, too. Bing's toolbar just does it via browser integration.

Google can and does track clicks from Bing via Google Analytics. Every time you click a result on Bing and land on a page using Google Analytics, Google knows about it, and they record your Bing search terms from the referrer. The same is probably true for pages with Google ads.


But as Matt Cutts pointed out, there's a very clear red warning on Google Chrome while IE just silently gives you a very unsuspicious disclaimer.


What does Chrome have to do with this mode of tracking? Google Analytics will capture a referrer from any browser that includes it, and no browser that I know of displays a "red warning" when including it.


NoScript provides a yellow warning, and is by default opt-out.


We don't use Google Analytics data in any way in our rankings. I've said that plenty of times before, but it's worth mentioning.


I didn't say you did, Matt. It seems like my post is being misinterpreted, so I'll try to clarify: The parent post was concerned about "the privacy bit". What I'm pointing out is that the same information (search terms in the referrer) is captured by Google Analytics (and by Apache logs, and by advertisers, and by Facebook via "like" buttons, etc, etc). From a privacy standpoint, there's nothing particularly new going on here, IMO.


Good for Google.

Bing have done wrong (granted probably not legally), and their response to a very detailed Search Engine Land article was a quick, nonchalant 'Huh? Oh that. Yeah, we don't copy Google's results. I know that doesn't really answer the claims but we don't really care enough to give a proper response.'

Bing's actions here (and their response) has seemed very poor and I definitely praise Google in going public with this.

I'd certainly like to think that if I was in a position where I caught a competitor piggybacking off my work, I'd go public with the information too (in a non-confrontational manner of course, as Google are doing).

So yeah: good for Google. Bad for Bing.


As many others have said, I don't think that using click data from the browser/toolbar as one of thousands signals can be considered "copying". When doing a query with a nonexistent word, all the other signals are zero because there is no knowledge about the query, so the only remaining one is the history of clicks "sniffed" from Google/etc... SERP. OTOH, on real world queries the signal has probably a relatively low weight.

I don't think that it is a secret that Bing uses click data from browser/toolbar as a signal, it's just a not well known fact. For example in the paper "Learning Phrase-Based Spelling Error Models from Clickthrough Data" (http://aclweb.org/anthology/P/P10/P10-1028.pdf) by Microsoft Research, they explain how to improve the spelling corrections by using click data from "other search engines".


The paper you mentioned appears to be saying that Microsoft is extracting spell corrections via clicks on Google. That's pretty surprising news.

I just pulled down the paper and noticed this: "The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser .... In our experiments, we "reverse-engineer" the parameters from the URLs of these sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired"

Some of the recent discussion has been about whether Microsoft looks at lots of different sites vs. doing something special or different for Google. This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction? Figure 1 of that paper looks like Microsoft is using specific Google url parameters such as "&spell=1" to extract spell corrections from Google.

Targeting Google specifically is quite different than using lots of clicks from different places. It looks like you work at Microsoft--can you say any more about this?


> The paper you mentioned appears to be saying that Microsoft is extracting spell corrections via clicks on Google.

Well, no, that's a research paper that says that they have made experiments in that direction, but this doesn't imply that this is currently done in Bing. But it gives an hint about what kind of data is available from the "log files from a commercial Web browser".

> Targeting Google specifically is quite different than using lots of clicks from different places.

From the article, they have handcrafted rules for both Google and Yahoo, that together with Bing have (I think) the 95% of the market. I'd say they are not targeting Google, they are targeting the majority search engine users. There just happen to be only 3 major search engines, so a few handcrafted regexes are sufficient.

I wouldn't be surprised if Google Maps has handcrafted (or manually tuned) scraping code to extract reviews from Yelp and other major review sites, and same for Google News for the extraction of the news body from the major online news sources. How is this different?

> It looks like you work at Microsoft--can you say any more about this?

Yeah, I should have been more clear about this. I am interning at MSR and have some involvement with Bing (and actually worked there last year), but my comments are personal and about facts that are public.

BTW, IMHO using the click logs can't be considered "copying", more like "a way to discover new sites to crawl and the keywords that lead to them". This is not copying the SERP results.

Since it "looks like" you work at Google :) can you answer this question (it was also asked here: http://news.ycombinator.com/item?id=2165963)? Doesn't Google use Chrome to get traffic statistics, through the opt-in "send usage statistics" and the malicious site protection?


>I wouldn't be surprised if Google Maps has handcrafted (or manually tuned) scraping code to extract reviews from Yelp and other major review sites, and same for Google News for the extraction of the news body from the major online news sources. How is this different?

Sorry, but Google drives traffic to their sites. That's what a search engine is supposed to do. Msft just scrapes Google's results and presents the data as its own.


> Sorry, but Google drives traffic to their sites. That's what a search engine is supposed to do.

Then why are newspapers not so happy about it? http://www.guardian.co.uk/media/2009/nov/09/murdoch-google

And, BTW, just to be clear, Msft can't "scrape". That would violate robots.txt.


> Then why are newspapers not so happy about it?

Rupert Murdoch and his kin are shortsighted, blustering fools when it comes to the 'net. Relying on their attitude to make your point is counterproductive at best.


"Doesn't Google use Chrome to get traffic statistics, through the opt-in "send usage statistics" and the malicious site protection?"

I saw that Peter Kasting from the Chrome team commented on this question at http://www.mattcutts.com/blog/google-bing/#comment-712619 . Here's what he said "I work on Chrome and we absolutely do NOT collect clickstream data through Chrome. Not even when you turn on the off-by-default “anonymous usage statistics”."


I'm a bit confused - see this image, from Google Analytics on my blog. http://i.imgur.com/oWK8q.png

Google Analytics knows that the search term 'autodesk revit devlopers guide' on Bing lead someone to my blog. I take it this information is in the HTTP header on the request to my site which the Google analytics code reads.

If Google were to use Google analytics information in their search results, how would that be any different to what Bing is doing? Or is the distinction that Google claims not to do this?


From http://analytics.blogspot.com/2010/07/will-using-google-anal...

"[Google] Search Quality in general does not use Google Analytics in ranking ... You can use Google Analytics, you can not use Google Analytics, it won't affect your ranking within Google Search results." It's dated middle of last year, I guess it's possible that something has changed, but nothing I'm aware of.


The referer is in the http headers. Whereas Bing is using IE toolbars to upload Google search results.


We don't use Google Analytics data in any way in our search rankings. The reason is that we wouldn't want a site's rankings to benefit just because they chose to use Google Analytics.


Isn't it bad form from a marketing perspective to continually mention their top competitor? Does Apple mention android so extensively in their press?

As Paul said, customers don't care. All they are doing is giving Bing some front and center advertising on it's blog (which has several non-tech readers) and the tech people who actually care probably don't enough to actually switch search engines.


Well, if you notice, they toed the water first.

Google didn't fire right off the bat with the hard-hitting blog entry, but instead basically gave a more detailed version of the same thing to Danny Sullivan. They wanted to see how Microsoft would react before going official with it, because even though Microsoft's response was predictable, there's always a chance that Microsoft would have surprised everyone with their response. (They didn't, in my opinion.)

What's struck me most about this story as it has developed throughout the day is that Google's actions are very deliberate and planned.

I wouldn't consider Google as continually mentioning Bing, either; in fact, I don't think they've paid much attention at all to them. Put Bing in the search box on their official blog, and you'll see that this is the only post specifically about Bing -- a perusal of older posts indicates that the rest are hitting on comments or TrackBacks (i.e., the background image misfeature).


If they mentioned bing in a super bowl commercial I'd agree with you but this is a tech blog. I believe there's a difference in the marketing results between the two.


I think that customers do care a lot of the time. Especially customers of tech companies, who often become evangelists for their chosen platforms and devices.

In this case, I don't know how this discussion (verging on disagreement) could occur without mentioning the competition.


It's funny that Google has a problem with Microsoft using their content (the search rankings), yet has no problem taking content from places TripAdvisor (http://www.tnooz.com/2010/12/14/news/tripadvisor-shrugs-off-...) and newspapers (http://www.google.com/url?sa=t&source=web&cd=5&s...), even when those companies specifically ask Google to not.

Google should be more careful here: either it's OK to repurpose other site's content or it's not, and Google has built their entire business around repurposing content. They shouldn't be surprised when their competitors start doing the same.


I think they (google) can really only cry foul if there is specific code in the Bing toolbar that targets google's search results.

The way that they describe the approach, it seems like the Bing Toolbar would also be scrapping results from bing itself, yahoo, altavista, ask.com and many others.


>Bing Toolbar would also be scrapping results from bing itself, yahoo

Isn't Yahoo search powered by Bing anyway?


These seem to be the relevant parts of MSs responses:

"Opt-in programs like the [Bing] toolbar help us with clickstream data, one of many input signals we and other search engines use to help rank sites."

“We do not copy Google’s results.”

I see MS denying _copying_, not denying _using_ Google search results. That makes the title of the Google blog post incorrect.


Google developed an impressive spell correction and error detection algorithm to improve their search results.

Microsoft inadvertently benefits from Google's research by simply watching and recording how people use Google. The end result is a Microsoft product that isn't as good as its competition, but it's good enough for some people. Sound familiar?

It's a classic case of true innovation vs. "Microsoft" innovation.


Some perceive Google's stand with hypocrisy e.g. http://twitter.com/#!/counternotions/status/3256864602692403...


That tweet is drawing a very specious connection. Android did not, as far as I can tell, copy any of Apple's algorithms or piggyback on top of them. It is a novel implementation of some of the same ideas in the iPhone.

Similarly, it's plagiarism if you take a Harry Potter book and publish your own version with the names changed, but James Patterson's "Witch & Wizard" has a copyright of its own despite being rather similar in concept.

(Edited to remove question about phrasing thanks to atularora's clarification.)


While Android it obviously inspired by iOS, it's not a direct copy. A better example of Google's hypocrisy is their outright copying news articles into Google News despite the source companies asking them not to.


Healines and snippits, and they respect robots.txt. The full articles they have are licensed from the AP.


What about the case of TripAdvisor?

TripAdvisor says "Google, don't copy our reviews for Google places."

Google says "The only way we won't copy your content is if you opt-out of completely."

TA says "We can't do that, you're the only search engine there is."

Google just laughs maniacally.

http://www.tnooz.com/2010/12/08/news/google-places-blocked-f... http://www.tnooz.com/2011/01/11/news/tripadvisor-content-on-...


I don't see how that's relevant at all. Google gave them the option of not being indexed. They decided they would rather be indexed by Google than not. The complaint that TripAdvisor doesn't get precise, fine-grained control over what Google does with its index seems like a fairly different issue.


The tweet author might be referring to recent claims that Android contains non-licensed Sun code.


That could be, but it would make even less sense to me. If they're talking about copying Apple, at least they did copy something from Apple, if only an idea. The copyright infringement claims have never been substantiated at all — it's just an accusation.


standby -> stand. Sorry for the typo


I think this is silly. Unless Google can come up with a copyright claim on its search results, and I seriously doubt they can, they have nothing to complain about. I use Google's search results too -- information you pump out publicly can be used to the advantage of your customers as well as your competitors. If Google is scared that Bing is "stealing" their search results, they should quit making those results public in a way where people can "steal" them. Accept that freely available information is freely available or clamp down and stop publishing information that might help your competitors. In Google's case, unfortunately, the info that helps Bing is also the info that is essential to Google's customers.

I know of some other media companies that are hyper-paranoid about their mass produced, widely disseminated, public content being "stolen" by others, maybe Google should set up a lunch date with the RIAA.


It's an appeal to the average person's "fair play" instinct.

I'm not saying that they should or shouldn't do something like this, but it seems very effective. They have a perfect narrative for people to wrap their heads around, and even if nothing illegal was done, it still feels like Microsoft is doing something "wrong".


Your argument is seriously flawed. Just because something is made publicly available doesn't mean it's free and clear for anyone to reproduce.

For example, if I don't chain my bike it doesn't mean anyone is allowed to steal it. Taking my bike is still a crime. How I protected it has nothing to do with the criminal act.


Stealing a bike necessarily deprives a person of his property. Copying your competitors when they do something people like is a time-honored tradition and certainly not illegal as long as the copying occurs along certain guidelines. If Bing was scraping Google en masse and just putting a "Bing" interface on it, Google might have something here, but looking at Google's results and deciding they have some better results (and it's unclear if they even did that) and then making sure you have the same "better" results on your page too is not wrong, and deprives no man of his property.

This is merely Google whining that Bing is delivering sort of good results because that threatens Google.


The stealing a bike example goes to counter the parent's argument that they can't complain if they don't protect their content.

As to whether this is a crime or not would probably be a copyright issue. If Google had a valid legal claim I bet they'd make it in court.


Yeah, as stated I really don't think they'll be able to claim a copyright infringement based on this, one link is not a copyrightable work.

However, it's not that what Bing did is not criminal, it's that it's a non-issue; there is nothing wrong with keeping an eye on competitors directly or indirectly and implementing good ideas or products that you don't have. Everyone does this, it's just normal business.

I don't know why Google cares or is making a big deal out of it, it just seems like whining to me, like they are mad that Bing can just manually compile a list of good results, even if those results come partially from Google. Sorry Google, that's just the nature of the format you're in and the game you're playing. A list of automatically-generated links with no custom or special content, much less a single link, would be a hard case to claim copyright protection, especially since for most actual queries it would be difficult to prove that Bing couldn't have come up with it independently.

I see nothing wrong with it and nothing unfair about it. Do you think the Google guys developed and tweaked their algorithms in a vacuum when they were starting out? You don't think they ever brought up AltaVista or Yahoo! results for comparison and tweaked until they got the same (better) results? I've heard several times that Google constantly has people manually tweaking queries and results, totally extra-algorithmically, to make sure they provide good results for everything possible. So what if Bing does this too? Bing is even being accused of something much less direct. I just see no problem at all, it's just the way this is played.


> If Google is scared that Bing is "stealing" their search results, they should quit making those results public in a way where people can "steal" them.

I'm curious how you'd implement that idea.


It's practically impossible for Google.


Simple: you just have to put a EULA at the top of every search result page... ;)


Especially since PageRank is "looking over the shoulder" of every content creator of the internet to do its thing.


Why do people think there's anything wrong here? Here's a (hypothetical) similar example:

In the 1990s, it probably look a lot of iterations, user studies, and market research to decide that copy/paste, undo, etc were the "right" set of features to include in a word processor. Do you think Google Docs re-did all that research? No, of course not. They probably just looked at Word and said, "we need to support these features". And there's nothing wrong with that. This is the exact same thing.

If you have a product out in the market, it's fair game for your competitors to look at and analyze its strengths and weaknesses, and use those to improve it's own product.


There's a difference between looking at word's copy/paste and deciding you want to make your own and making your copy/paste a hack around Word's copy/paste.


I see it more as copying the toolbar item by item.


This is an excellent short term PR tactic, but Microsoft can just say that they're not copying Google's results, they're just using user click data to improve their search results and that sometimes that click data happens to come from users' google searches.

It's understandable why Google's concerned, because it's likely that Microsoft has access to a lot more this data due to their OS and software's ubiquity.


I can only imagine the confusion the Clyde-Findlay Area Credit Union SEO team is going through right now:

"Why are we getting so much traffic from people searching for 'delhipublicschool40 chdjob'?"


I thought the same thing about Team One Tickets:

"Is hiybbprqag some new band the kids like? Why don't we have tickets for them?"


More interesting than anything Microsoft is doing here is Google's answer to it. If Microsoft caught Yahoo doing this, they'd bury them in lawyers. Google is confident enough to just go public and take the PR win.


I guess they gave up the idea of suing because they were unlikely to win. Even if Bing was directly scraping Google's results from Bing's servers, it's iffy because robots.txt is not something that's enforceable, and listings of things are not copyrightable. See http://en.wikipedia.org/wiki/Feist_v._Rural


It is unclear whether the Google results appear in Bing from the same computer the initial queries were made i.e., it is a personalization of search results or Bing uses that results for other users too.


>"We gave 20 of our engineers laptops"

OK, over the course of several weeks 20 google engineers were able to inject 7/100 false searches into Bing's database. That is more structured like a brute force attack than a scientific experiment. Is Google really surprised that SEO works? [edit]The blog contains nothing significant about methodology - no control groups, no restrictions on automation, no limits on methods used. In other words, what this shows is that 20 Google Engineers were able to hack Bing and that they did so for PR purposes.[/edit]


"so Google set up a honeypot – some made up words like [hiybbprqag] linking to random unrelated sites."

That's not my understanding of what Google did at all. Google fed back search results for keywords that didn't exist on the Internet -- period, and they started eventually showing up in Bing.


What strikes me odd is why Microsoft would bother with making a sneaky toolbar that calls home instead of just grepping through their Bing logs for queries with no results and then running these queries against Google...


>What strikes me odd is why Microsoft would bother with making a sneaky toolbar that calls home instead of just grepping through their Bing logs for queries with no results and then running these queries against Google...

Because what you describe would be underhand and probably a violation of Google's ToS that could cost them very dearly in PR and money (in a court case).

They have access to user data that users say they can access (click-through or whatever) but they don't have access to Google search results directly unless Google allow this (which I've not checked but can't imagine they do allow).

The grey area is that Google's ToS relates to their relationship with their clients (people who search using Google), do they disallow their clients if they're using MSIE with tracking? Doubtful, if they did (by a technicality say) then they could sue their clients but they couldn't (a priori) sue MS as MS are acting in good faith in their relationship with the same clients (people using MSIE allowing data tracking). The onus would appear to be on the Google users not to have tracking enabled (if indeed Google's ToS disallow such things).

Why not just grep Bing logs ... well clearly they can tell a lot about the relevance to a particular term by seeing how long a user spends on a page after searching for that term and following a link. If the user bounces then it's not likely to be high quality. This sort of info won't be easily gathered from Bing logs if indeed it is possible to get at at all.


I imagine it's because they want the Google results as input even (and especially) in cases when there are indeed results.


Well, Windows doesn't come with grep, so...


Well I guess THAT joke fell flat. Oh well...


Seriously? Anybody else surprised at how little data there is given how serious the accusations are?

From what I've read, the general consensus seems to be that Microsoft is using IE in conjunction with the Bing toolbar to analyze user's search data. And this is something that worked only on 6 or 7 of the 100 terms that they tried it with? That was enough to incriminate Bing?

Google could've at least tested to see if this behavior is limited to just Google or if Bing was also analyzing other search engines (or even other pages). I would've expected MS to have released something like, not Google.


Another example like this that has just come out - Qwiki by Eduardo Saverin, co-founder of facebook. The new search engine pulls data for its results directly from Wikipedia and adds it's own flare. The open source text comes directly from Wikipedia.

Qwiki - see contents http://www.qwiki.com/q/#!/George_VI_of_the_United_Kingdom Wikipedia http://en.wikipedia.org/wiki/George_vi_of_the_united_kingdom


Did they do the same test with blekko or duckduckgo too? It would be interesting to see if Microsoft is the only one doing this.

I would try it now, but the test has been polluted by all of the news articles.


What a very clever way to test their theory. I'm very curious to see how Microsoft explain this one.

Also, why do I now want to buy things with "hiybbprqag" printed on them?


It's not really that clever.. it's the search equivalent of a trap street in cartography.

http://en.wikipedia.org/wiki/Trap_street


As many have pointed out, Google has used clickstream data from their toolbar for a number of years.

Also, Google has used the links provided by hub and search pages to find relevant sites within a niche. They have happily indexed links they discovered on those pages, and then removed or penalized the pages that pointed them to it. It's OK, of course, because any SERP not provided by Yahoo Google or Microsoft is termed "spam"


"You can think of the synthetic queries with inserted results as the search engine equivalent of marked bills in a bank."

It's actually more like trap streets on maps.


While this is not completely Black and White, one thing which favours Bing is that a user has clicked on a search result which is determined using his own intelligence.

Google has assisted the user with that action. Bing is only correlating these two individual actions(search and click) by the user, to get some additional signals.


Never mind the ethics part. The fact that Bing's 'signal gathering' mechanism can be fooled into accepting some bogus links, without even an iota of content verification, illustrates a fatal flaw. I am sure developers at MS are capable people, but this throws poor light on them (well, at least in my eyes).


Well, at least Bing will get a pagerank boost from all the new backlinks in the press coverage.


How about this analogy? If Company A publishes a book and Company B somehow manages to get the content of the book from its readers (with their consent). Then B publishes the same content as if it is original content, how does that sound to you?


Think I'll just quote Mike Masnick's post over at techdirt, since he sums it up so well: "For Google to attack a competitor for using open information on the web -- the same way it does -- seems like the height of hypocrisy. It's fine for Google to crawl and index whatever sites it wants in order to set up its ranking algorithms, but the second someone looks at Google's own rankings as part of their own determination, suddenly its "cheating"?

This seems like the latest in a series of indications that Google has moved past the innovation stage into the "protecting its turf" stage. That would be a shame." http://www.techdirt.com/articles/20110201/11022312911/google...


They should have made one of the nonsense queries something like "westealourresultsfromothersearchengines" and linked it to something like yes.com just to make the copying easier for non-tech people to understand.


I may be misunderstanding what is happening, but is it possible Bing is just doing what it was programmed to do- examining the user's browsing habits? Is it necessarily directly and purposely targeting Google?


Do I understand this right? Google (over a period of time)knowingly submitted info to Microsoft. When Microsoft used that data, Google accuses them of copying?

Does Google use the data I give them? Are they copying me?


Close, but you didn't understand it quite right. Google knowingly submitted info to Microsoft to confirm a suspicion that Microsoft was using google.com search results from other users to populate bing results.


In a way I see this click tracking that Microsoft is doing innovation. What better way to get relevancy than directly from the user and what that user clicks on?


Google could have made this story all the more interesting by using misspellings of the terms nihilartikel and mountweazel in the honeypot queries.


Google should make their own toolbar that continually sends random bad data back to bing to screw up their results. I'd install it.


I think this is a very bad move by Google to get into these types of pointless arguments. Google being in the market leader position, there is nothing Google can gain from doing this. Google won't convert Bing users over with these acquisitions, all Google doing is raising awareness for Bing. If anything, Google would lose users to Bing for doing this.

If anyone from Google reading this. This is not a smart move, and should be ended asap.


Take a look at feature list here: http://research.microsoft.com/en-us/projects/mslr/feature.as...

Combing features 130, 131, 135 and 136, I think it is understandable what Google engineers did can give those fake links a boost in the search results. In a way, they cheated the algorithm.


Is it possible bing is pulling the results from a third party provider who is doing the cheating? Maybe a middle man is partially whats to blame for the delay in the results.


this is brilliant. all those years of "innovation" ruined by a simple MiTM tap.



Google, you are embarrassing yourself.


use hashtag #BingGate


I don't see anything wrong Bing is doing. There's clearly an indirect link between the synthetic query and Google result.

If Bing was outright stealing Google results, all you have to do is:

1. setup the synthetic queries on Google 2. search for them using Bing

Clearly, it took several weeks of Bing toolbar being installed and people going to site X after searching for Y. The Bing toolbar has the right to assume there's a relationship between X and Y. It's a legitimate "ranking" strategy.


"Clearly, it took several weeks of Bing toolbar being installed and people going to site X after searching for Y."

... and even then it only worked for 7-9% of "synthetic" nonsense queries.


Well look, this thing comes preinstalled with Windows, so all the Google engineers did is go to www.google.com on IE, search for something and - voila a short time later the results are on bing.com.

Automated spidering or not, the way this is setup borders on the edge of stealing. I can see why they feel the need to complain about the issue.


That's not all they did:

We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.

Essentially, the engineers enabled the user tracking features of IE and the Bing Toolbar, ultimately seeding Bing with the desired results. How is that stealing?

On a related note, can this technique be exploited to improve site ranking on Bing?


> On a related note, can this technique be exploited to improve site ranking on Bing?

That was my thought when I heard of all this! I don't know what kind of authentication the bing toolbar does, but this seems ripe for reverse engineering, then pumping fraudulent data to Microsoft through a botnet...


The Bing toolbar doesn't come preinstalled with Windows.


IE will send this data even without the toolbar.


Link? I haven't heard this mentioned yet.


It's in the blog post: "Internet Explorer 8, which can send data to Microsoft via its Suggested Sites feature"


That's different data though. I thought you meant the Google search data.


The time it took bing to show the results has nothing to do with the fact that they took it from googles result pages. They clearly took the results from google or possibly from another source that took it from google. Bottom line, the results came from google.


In my view this is only a problem if IE8 / Bing Toolbar has a line which is effectively:

  if current_page == "www.google.com":
    """ Steal results from competition """
Otherwise, generically applying this on other sites is clever and innovative. Effectively turning every user into a web-crawler with a built in mechanical turk to rank useful links on that page.

I think google are framing the debate well, and perhaps exposing something that isn't exactly privacy friendly but to claim they are stealing from google is similar to me claiming google is stealing from me because they indexed my collection of favourite links on my homepage.


>but to claim they are stealing from google is similar to me claiming google is stealing from me because they indexed my collection of favourite links on my homepage.

If you've denied google access to your site in your robots.txt, then it would be a reasonable claim. (Google has denied bing access in its robots.txt)


I don't completely disagree in that perhaps Bing should respect Robots.txt when using information gathered in this way. However, the aim of Robots.txt is to restrict the actions of a automated web-crawler (bandwidth limiting and preventing unwanted interaction with dynamic pages), in this case the crawling is not carried out by a 'robot' so in my opinion there is room for interpretation.

I guess its an interesting point, clearly denying access to an area in the robots.txt suggests you do not have permission to use that information. However, only the TOS will be definitive on what you can and cannot do. For example not including a robots.txt file clearly does not waive all rights, but interpreting the TOS is clearly beyond the capability of an automated system.

In this case I would suggest the only safe course of action for Bing would be to have an exclude list of domains and allow anyone to have their own site excluded from this information gathering.


So? Bottom line, all search engine results come from somewhere else. When you search for Hacker News, it's not like Google knows what Hacker News is, it grabs the results from else where and ranks them.

When there's only one result, it doesn't matter how you rank it, it will always be the only result.


"all search engine results come from somewhere else"

I believe the word you are looking for is "interweb".


Google should have kept quiet, and figured out how Microsoft is pulling data from Google. They could then create a script that would cause Bing to link new borrowed content to goatse.


bing only sees, what google's customer are seeing. so Google can't do it without screwing its own search results, and even worse it won't affect top 1 million query strings, as for them bing is getting much stronger signals from other sources.


Why didn't Google's investigation go further? Why didn't they decompile the IE8 toolbar to figure out what it was really doing? Maybe that's against the DMCA and Google can't admit to it?

Having the evidence in code would have made the accusation irrefutable.


Agree. I have to imagine that someone at Google captured the toolbar's HTTP traffic. Maybe they're holding it back, or maybe it's the same for Google results as it is for, say, Lycos.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: