Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft: 'We do not copy Google's results' (zdnet.com)
111 points by mjfern on Feb 1, 2011 | hide | past | favorite | 74 comments



The statements by the Microsoft people remind me of "I did not have sexual relations with that woman."

They don't seem to realize that ordinary consumers don't care about this, and that the people who do care (e.g. investors, potential employees) are smart enough that these evasions just make the problem worse.


Why would potential employees take this negatively? This is exactly the type of thing I'd love to work on. Give me opt-in to your web browsing characteristics, and let me optimize your searches. Opt-in is the key.

I'm certainly an odd one here, but I'm more inclined to use the Bing toolbar due to this, although the perf cost is still too high to use any toolbar, IMO.

What would be nice is if the browsers had an open format of your browsing history that you can send to providers of your choice...


Because if Bing has to rely on Google Search's solutions to problems it should itself be able to solve, that implies it's inferior technology.


Perhaps 'inferior' in some dimensions, but trying harder with every scrap of legal data they can acquire. That's what I'd expect from any upstart against an entrenched leader, a leader with a long headstart and giant informational advantages via ubiquity and scale.

Google leverages every legal scrap of information they can get their hands on to improve their services – including the outlinks of every site they crawl, and clicks of users all over the net. If there's a legal way to observe the users and content of Google's sites for info about which other sites are good – and I haven't yet seen an allegation from Google that Microsoft is doing something illegal – why should any company refrain from using those signals, given Google's giant use of such info from elsewhere?


What I see is that Bing filtered out 93/100 of the deliberately planted seeds, and what I don't see is how many times Google sent each seed packet to Microsoft. If Google sent each packet through the Bing tool bar once, it is different than if they sent each one through 1000 times. [edit] In that case what Google did was exploit a feature of Bing toolbar in a manner somewhat akin to an injection attack. It is also not evident what Google tried that didn't work.[/edit]

Finally, what I find interesting is that Google basically broke what they claim to be their prime directive - not hard coding search rankings - for PR purposes. There isn't a good story behind that either way. Either approval came from the top, in which case senior management broke the directive to embarrass a competitor or it was broken by cowboys outside of management control. Of course it is also possible that this sort of thing is done regularly within the course of business.

[Aside] I want to be the first to dub this "Binggate."


"broke what they claim to be their prime directive - not hard coding search rankings"

If you consider setting up a honeypot on a term with no hits to really be going against the "not hard coding search rankings" in any meaningful way, you are most likely some sort of obscure Bing ultra-partisan.


Let's be clear, Google hard coded search results for PR purposes and generated the PR with a 20 engineer black hat operation specifically intended to inject the honeypots into Bing's results. It was not a clever little hack. It was a multi-man year industrial espionage operation utilizing some of the best talent in the industry for the sake of a PR win.

The fact that Google's operation only succeeded with 7 out of 100 honeypots is telling evidence that Microsoft is explicitly trying not to do what Google is accusing it of doing given the scale of Google's operation.

This has nothing to do with Bing. Google's Prime Directive is their promise and no one else's. The fundamental nature of a promise is that it breaking it is not excused by "the ends justify the means."

However, some people may consider a promise to be breakable under extraordinary circumstances. I won't disagree that such a belief can be considered reasonable. But the circumstances under which Google violated their Prime Directive are hardly extraordinary - dragging Microsoft through the mud at a search conference


This is laughable. They chose keywords no one would ever see as part of a real search. They are literally nonsensical strings of random letters.


I don't think there are many people out there that don't believe Google's technology is superior.

Personally, as someone who's held MS stock for a while, I kind of find this tactic to be deviously brilliant. For a dog thats been back in the race for a while I'm okay with them leveraging every angle they have to catch up. There's a ton of mindshare to compete with at Google. This seems to be a fairly effective short cut for Microsoft.

I think I'd see things your way more if this were two competing, resource constrained startups. But if its two 800lb gorillas going at it I'm going to have fun watching them trade blows and see how it turns out. Playing "fair" doesn't matter as much to me in this case.

But that's just my perspective - everyone will see this differently I'm sure.


Yes, it was clever, until Google caught them at it and started using it to undercut their "mindshare".


I put even money that this helps their mindshare. For people that use Google/Bing explicitly today -- no delta.

But for a lot of people this is going to sound like, "Google catches Bing cheating". And for a lot of people this is going to sound like, "Google complains about competitor Bing".

Given where Bing mindshare and marketshare is today versus its search qualty, I think this isn't a horrible place to be. Esepcially if it gets people to say, "Let me try a query on Google and Bing and see who is copying?" or to simply see what the hoopla is about.

And Google made a mistake when they worked with Danny in getting this story out too. It should have been "Microsoft copies", not "Bing copies".


'"Google complains about competitor Bing"'

"Bing is too weak to stand on its own and it knows it, so it cheats."

I do not think the court of public opinion is going to let go of the word "cheating" so rapidly. It's a toxic word, and whether or not you think it's true it's supported well enough that it won't matter.


I think you overstate it. Ask your dentist about the MS antitrust case and if MS cheated in it. I bring up dentists, because way back when it happened mine gave me a monologue about how MS was being raked over the coals because of how good they were.

He didn't know anything about, in this case, actual fraudalant and illegal practices. And this was a much bigger story at the time than this is likely to be, by orders of magnitude.

Most people, to this day, think that MS was sued because they were a monopoly. Not because of illegal practices while a monopoly.

Given that its Google bringing the charges, and not the DOJ, I think also weakens it.

And lastly, the fact that it's not illegal, also makes it look like whining. Steroids are illegal and generally considered really bad, and the American public hardly cares about that. If Barry Bonds was a nicer guy the public may well have demanded they make steroids legal! :-)

My point, since I rambled, this will be more he said/she said. And less one sided than I think you believe.


It's not really relying on Google's search solutions at all. It's relying on users' search behavior. It's coincidental that they (users) use Google and that they happen to always click on the bogus result --when no other data are available.

Their method, when no data is available for a search term, is to sometimes (7 out of 100?) rely on human ranked results (in this case gamed by Google) via user behavior gathered with their toolbar.

It'd be interesting to find out if Bing was only using behavioral results of Google users or they recorded behavior by users of all search engines. Unfortunately, Google don't seem to have conducted an A/B test to find out, so it ended up with confirmation bias.


You put what I was thinking more succinctly than I could. I don't like the witch-hunt attitude when there's no proof that they directly focused their metrics on Google.


"rely"? It's using customer behavior as a signal. Someone typing into a search box and clicking a link is a pretty good signal, regardless of where the search happens.

This only implies inferior technology if this is the only signal you have in determining relevance, but clearly they have tons of them.

The fact that they're trying to capture the totality of the user experience is a very good thing (again with opt-in). Search relevance hasn't progressed as it should. But I think if they could use my web browsing habits it would increase a fair bit.

And lastly, Bing and Google are near parity, but not identical. I don't think anyone is under any allusions that neither engine could learn a thing or two from the other.

It's not about inferior or not, but trying to get better.


Exposing this practice has also introduced another potential problem for Bing. Gaming of their search results.

I would expect to see spammers sending many thousands/millions of clicks on specific Google search results through Bing's Toolbar for long-tail queries.


You'd first have to get the result to come up as a result of your search.

(Although I suppose you could hack the toolbar and send the data manually.)


They don't. They rely on tracking data on performing any search and the resulting click which occurs. Just because Google happens to be the middleman in some of these metrics, doesn't mean there should be a massive leap of logic to the statement: "Bing has to rely on Google Search's solutions".


> that implies it's inferior technology.

Search engines learn a lot from their users. Think of spelling corrections, part of them are learned from users that typed both the mispelled and correct query in the same session.

Like in this case, many times better results don't come from better technology, but from a bigger userbase that gives more data to learn from.

Google has almost an order of magnitude more users than Bing, hence an enormous amount of data to learn from.

IMHO what Bing is doing is a way to tap a (biased, incomplete) part of that data (that also Google has, in their clicklogs) to use it as a signal for crawling and ranking.


Yes it would imply that, if that were indeed what was going on.


Problems like returning meaningful results for "hiybbprqag"?


No, problems like returning meaningful results for "torsoraphy". See the first example that is provided in Danny Sullivan's post.


Not to rationalize away the similarity between nonsense words and search results, but there are legitimate reasons why gross misspellings might yield similar results. Among them are that similar algorithms may have been derived from similar libraries and academic literature.

With 25+ years in the word processor business Microsoft certainly has significant experience with spell check and algorithmically deciphering bungled words - The claim that Google has better code for deciphering jumbled spelling is not self-evident.


The suspicious part is that the "corrected" version only appeared in one link, the first. So it doesn't seem to have actually recognized the misspelling as such, but associated it with the Wikipedia page through some other mechanism. There's definitely a circumstantial case that it's through the Google clickstream, though I don't think that's the only plausible explanation (especially given that it's a Wikipedia page).


Right. Typically, Wikipedia would know have a record of the search term being used to reach the page. Of course, it is plausible only if Wikipedia shares that information with others.


moreover, Clinton's statement was an outright lie, not an evasion. And it didn't seem to hurt his approval ratings. The analogy makes no sense.


Apparently his justification was that he didn't consider what happened to have been "sexual relations," because he hadn't been what one might call an active participant.

The far extreme of evasion, certainly, but the narrowness of his definition of "sexual relationships" seems in the same ballpark as the narrowness of Microsoft's definition of "copy."


Also, Clinton is of a generation that grew up with the understanding that gentlemen don't "kiss and tell".


I think Clinton's actions were well within the common-language definition of "sexual relations". I don't think the Google "experiments" showed anything about Bing's behavior that falls within the common-language definition of "copy".


> Clinton's statement was an outright lie

He employed a very narrow definition of "sexual relations" that was made possible by the ambiguity of what a sexual relationship actually is.


It depends on what your definition of is, is.


Most people want to work in a company that innovates.

This isn't innovation. Any fool can write something that copies data from a competitor.


Like how Google copies the titles of web pages and snippets and puts them in search results?

Ahh... the innovation is not in the "content", but how the content is determined. Google crawls the web, copying content, creates and index, and a sophisticated formula for determining relevance of search.

MS has innovated by taking this one step further by saying, "if you opt-in we'll use your behavior as part of our index and relevance metric". So they no more copied from Google than Google copies from CNN. But in the same way Google has created value from mined data, Bing has created value from mined opt-in user behavior. That's the innovation.

If one can't see that, I think most of these places would do a "no hire" in any case.


With that logic you should definitely work at MS. It'd be a perfect fit.


I agree. I do think it indicates a cultural position, and to the extent that you don't fit culturally you should get a "no hire" to the benefit of both parties.

That is why I disagree that this is harmful to potential employees. I'd no sooner want to work for someone who thought this indicated inferior technology as they'd want me to work with them.


Just to address your 'logic' above though:

> "Like how Google copies the titles of web pages and snippets and puts them in search results?"

Google copies parts of web pages to benefit the website. To send traffic to them. It's beneficial to both parties which is opt-out by the webmaster.

How does Google benefit from Bing copying their search results? They don't. How can Google opt-out of Bing doing it? They can't.

Bing copying other search engines results is throwing down the gauntlet and stooping to desperation. It's MS announcing "we are not a technology company" - something I think would put off candidates who want to work at a technology company.

You might find it all morally acceptable, or even impressive. I don't.


Bing isn't copying the Google page. Bing is copying what the user typed in a form. And a link the user clicked. The benefit to the user is improved searches from Bing in the future.

Google can block Bing from crawling the search results, which it does. But Google does NOT OWN what the user typed, nor the URL the user goes to.

If MS is scraping the page that's one thing. But if MS is using information provided by the user that the user has given them permission to use, then it is completely fine.

I find it morally acceptable and impressive.

EDIT: Google could block users that have the Bing toolbar installed from using Google search.


I'm quite stunned that Google allowed this story to break.

It's such a bold claim, and with such an obvious explanation - Bing use all kinds of click data, whether from Google or not. Did they really not realise that was what was happening, or did they think "this is a certain PR win". Or do they just not have control of what their people say in public?

It's like if MS had discovered googlebombing and used it to 'show' that Google were 'copying' from webpages, or worse assumed that Google were manipulating search results.

I think Harry was exactly correct to describe this as a new kind of click fraud, and Google's handling of their discovery of it just amazes me.


It does look like Google thinks this is a PR win. With their comments in the search engine forum and Matt Cuttis' comments in the original HN thread, it is hard believe that this is a "rogue" faction of some sort.


Obviously this depends highly on your definition of "copy". They presumably do not scrape Google's answers, but using user behavior patterns targeting search sites (or tracking all user data but analyzing the search segment) and what people clicked is just as bad.


Don't Chrome and the Google Toolbar also track clickstream data to refine search results? I assumed that's what they were asking my permission to phone home with, to 'improve their services'.

Do they gentlemanly ignore clickstreams on competitor web sites?


There's what you clicked on, and there's what you typed into a search box, hit enter, and then clicked on. The two are quite different.


It's wrong to associate the clicktrail to the content of the page the link came from? How in the world can you tune for search if you can't/don't include context?


How do you know that they target search sites? This could also be a more generic approach: If there's a link from page A to page B also return page B if someone searches for page A's keywords and there are only few results.


Have you read the original claim? Google did a test with completely nonsense words and 100% unrelated search results with those terms. The results showed up on Bing. Whatever Bing is doing, it is using Google's search data.

http://searchengineland.com/google-bing-is-cheating-copying-...


I didn't deny that they are using Google's data. But it could be possible that they have some kind of generic approach not specifically targeting Google (or even search engines).


Exactly, they could just observe what user clicks after search (on any search engine, including Bing) and use it as a signal.

If that is indeed the case, then they are not copying results from Google, they are just tracking users' choices to improve position of interesting results on Bing. Personally, I don't see anything wrong about that.

Keep in mind that in Google's experiment, they always clicked on the first result, which is probably why their result was send to Microsoft and that the experiment "succeeded".


needn't even be limited to "what user clicks after search" could simply be "what user clicks"

They're trying to link Content A and Content B, where "Content" includes page content, URL(including GET string terms), and various other factors. Could happen on every page.

Google-fans then tend to get a bit knicker-twisty about privacy implications, but first I dislike that line of argument as it confuses the issue (what are we mad at, the copying or the privacy? Oh I know, we'll say copying and when someone rebuts that we'll start whining about privacy), and secondly, as patio11 eloquently said: Google doesn't really want to get into a heated discussion about the evils of a search engine knowing everything you've ever searched for. Stones, glass houses, etc. ( http://news.ycombinator.com/item?id=2165682 )


> Personally, I don't see anything wrong about that.

I, however, would call it spyware.


Yea sure but even the Google toolbar reports user behavior if you let it.


How long would it take for Microsoft do stage a similar trap for already installed Google toolbars? The fact it hasn't been done is a good indicator they couldn't.


One way of yielding the same effects without even factoring in search inputs would be to assume a probabilistic relationship between the words on an origin page (e.g. a Google search result page, or a bog standard web page) and the destination URL the user clicks through to. This seems like a pretty reasonable design parameter for the Bing Bar to learn "Suggested Sites" and if you're collecting that information anyway why wouldn't you add all those associations into your search engine to help it rank those tricky obscure search queries?

For synthetic words like "hiybbprqag" that rarely/never show up on the internet [except on Google search result pages generated by Google engineers systematically searching for it], the basic probability algorithm would weight heavily towards assuming an association between "hiybbprqag" and the destination URLs viewed by people immediately after looking at pages referencing "hiybbprqag". Since probably the only people looking at pages referencing "hiybbprqag" were Google testers searching for it, who had been instructed to always click on the "synthetic" Google result, the probability of someone viewing a page referencing "hiybbprqag" subsequently going to http://www.teamonetickets....wiltern-map.html* would be close to 1 - suggesting a pretty strong association between the terms.

If you incorporated these associations into the Bing search engine in any way, it would be perfectly reasonable for a search engine to assume that that page is the most relevant result for hiybbprqag*, given the lack of any alternative data on what to show.

Obviously this isn't as simple as the other solution (and Google could have done more sophisticated tests which rule out this kind of algorithm as being behind the results), but the competitive end of search isn't simple.


Right, but the nonsense word is part of the referring URL for the link that the user is clicking.


Sure, the question is how you differentiate those actions. Some one typed in [x] and then clicked [y] could certainly be generic, but that would be a massive amount of data to crunch and would require a significant amount of congruency to be enough to modify the Bing algorithm.

How would it differentiate between a form submission and a search? And if it's not and still sending all of that data back to MS, that's far more disconcerting than them just watching me on Google.

It's certainly more believable that the sites were either targeted or were extracted from massive amounts of data for this purpose.


"... we embrace and extend them."


"We do not copy Google's results."

Which does NOT answer the claims really. No one is saying they COPY them verbatim. They're saying that they SOMETIMES 'look at' the Google results as a ranking signal. And in SOME cases (it seems when it's a fairly obscure search phrase), Bing displays the same result as Google does - as the sting operation shown.

This naturally doesn't mean that Bing copy Google 100%, of course. But that's not what today's news is about.

A very weak 'reply' from Microsoft which again seems to pretty much confirm that they ARE 'looking at' certain Google results.


It's getting blurbed as "Bing's results are a copy of Google's". The nuances you detail are buried halfway through Danny Sullivan's article and are not reflected in his takeaway in the conclusion.


> It's getting blurbed as "Bing's results are a copy of Google's"

Which is untrue. They are nowhere near as good.


This is a sloppy article. If you are going to quote a Microsoft blog post in part, do link to it. Does anyone have a link to the blog post in question from Harry Shum, Bing Corporate Vice President. The quote sounds pretty evasive but I would like to see the whole thing before I judge.

EDIT: I found the post:

http://www.bing.com/community/site_blogs/b/search/archive/20...

Reading the whole thing it seems like he pretty much admitted copying Google's search results through users of the Google toolbar. He calls it "collective intelligence."


I guess that's best possible defense for them at the moment.

"We do not copy Google's results (however, we do use them in 'creative' ways!)"


> "We do not copy Google's results (however, we do use them in 'creative' ways!)"

What's the difference to what Google does by e.g. using newspaper articles in a "creative" way (by not only showing them in the search results, but by combining them to a new type of newspaper at news.google.com)?


Bing attempts to heavily monetise their ventures. So basically they're taking away potential eyeballs. And Google did run into a similar issue with news.google.com: http://www.google.com/hostednews/afp/article/ALeqM5gRqFRzuaE...


I believe Ben Bradlee would call that a "non-denial denial".


"It’s a point of pride to Google that it believes it has the best spelling correction system of any search engine"

Indeed, my 11 year old sister loves typing in silly search terms such as "stroobaray pai" and laughing about the fact that Google actually figured out what she meant :-)


Pragmatically, the following scenario underlines why this is practice is overall bad for the consumer.

  Company A produces product 1.
  Company B produces product 2 inferior to product 1.
  Consumers use product 1.
  Company B enhances product 2 by actively engaging product 1.
  Some consumers use product 2.
  Company B product 2 rises in market share.
  More consumers user product 2.
  Company A drops product 1 due to lack of profits.
  Product 2 quality deteriorates due to absence of product 1.
  Consumers left with inferior product option.


Creating a bullshit opt-out -- that they know damn well virtually no one will ever read -- and then having IE send apparently the entirety of a user's browse behavior to MS is sketchy as all hell. One more reason never to use an MS product.


This remains as relevant as ever...

"Bad artists copy. Good artists steal." -Picasso


Microsoft: "Music you buy from us Plays For Sure.(TM)"

(the above is just my interpretation).

Is there a pattern here?


As always, buried under the platform hate and apologists from both sides, the answer probably lies somewhere in the middle.



I know it is fun to categorically dismiss posts with a wikipedia link to some arbitrary fallacy. (Hey. I can do it myself: http://en.wikipedia.org/wiki/Argument_from_fallacy)

But he does have a fraction of a point.


I agree I should have written something to go along with the link in the lines of "careful with your conclusions" to make my intention clear.

It is possible, however, to arrive at a right conclusion through incorrect reasoning.


MS caught on:

- optimization for SunSpider [1]

- now scrapping google SERP

Curious what is next? Linux code is in Win7?

[1] http://news.ycombinator.com/item?id=1913102




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: