Hacker News new | past | comments | ask | show | jobs | submit login
How I got sued by Facebook (2010) (petewarden.typepad.com)
175 points by helwr on March 25, 2011 | hide | past | favorite | 37 comments



"my lawyer advised me that it had never been tested in court, and the legal costs alone of being a test case would bankrupt me"

What's to stop two smaller companies making a "court case" where they sue each other for small bucks with the desired outcome (following robots.txt is a legal way to access a site with a crawler). This would then set a precedent that would benefit others as a whole.


Testing a case in court is not that easy. The vast majority of litigated cases never result in precedent. Most cases are filed and disposed of with no written decision or only with a routine order or judgment that is of interest to the parties in the case only.

It usually takes in-depth briefing by sophisticated lawyers to generate a precedent of any meaning. This takes a lot of money and effort. Unless parties invest a lot into a case to give a court detailed guidance, the court will usually take the path of least resistance and dispose of the case on minimalist terms that have no precedential value for future cases.

In addition, courts do not permit collusive cases and are pretty good at sniffing them out.

In theory, one could file a low-level case and wind up with a helpful precedent. But this is highly unlikely. Of every 1,000 such cases filed, maybe 1 would have a slight chance of ever resulting in anything meaningful. That is just the reality of the litigation process. It takes a lot to generate meaningful precedents and that is why organizations that take on this burden (e.g., EFF) are needed to pour the vast resources into the process that it typically takes to get such results.


Now that you mention it, what's stopping two such companies from manipulating the result of this landmark case ?

What's stopping Facebook from setting up a puppet company to sue to obtain their desired precedent ?

This seems like a huge hole in the "let's let the courts decide the law system".


judges are not even remotely amused by attempts to manipulate them, and if they discover it, i can't imagine things ending well for parties making such an attempt.


That's called "fraud" and there are pretty severe repercussions for that.


Lawsuit nastiness aside, there's an interesting and important legal-technical question that this exposes: how should websites specify acceptable uses of crawled data and other fine-grained restrictions in a machine-readable form.

Motivated by this incident, I got together with Pete (the author/victim) to write a piece on "The Need to Reboot Robots.txt" [1] but it went nowhere.

Any suggestions on how to give our proposal legs would be much appreciated.

[1] http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-rob...


You can set the SyndicationRight directive for OpenSearch.

"Contains a value that indicates the degree to which the search results provided by this search engine can be queried, displayed, and redistributed."

The default is "open" meaning: - The search client may request search results. - The search client may display the search results to end users. - The search client may send the search results to other search clients.

http://www.opensearch.org/Specifications/OpenSearch/1.1#The_...

That would give you more fine-grained control over what search agents do with your data. I don't know how broad the support and adherence is to the OpenSearch spec (IMDB uses it).


I wonder if he asked EFF if they were willing to defend the case. The thing is it's probably never individually worth defending against these cases, but on a society level there'd be so much gain if someone had set a legal precedence for the validity of robots.txt.


They settled out of court:

> He was with the head of their security team, who I knew slightly because I'd reported several security holes to Facebook over the years. The attorney said that they were just about to sue me into oblivion, but in light of my previous good relationship with their security team, they'd give me one chance to stop the process. They asked and received a verbal assurance from me that I wouldn't publish the data, and sent me on a letter to sign confirming that.

The robots.txt part was just Facebook lawyers trying to grasp to a contract with them; the actual issue was that the EULA allows Facebook to make basic information available, but users do not expect such a database to be freely available. Although I personally regret there is so little done by the data team to help research, legal consequences were not worth the prank; what actually matters is usage, and Facebook clearly police the spirit of the platform rather then the law, even beyond their contract partner—see RapLeaf.


A case before the Supreme Court of Canada right now [1] touches on a similar untested premise of the open web. At issue is whether a hyperlink constitutes a citation or a republication of that page.

In this case, the plaintiff is accusing the defendant of defamation for linking to web pages the plaintiff argues are defamatory. (Aside: compared to the US, defamation law in Canada is weighed much more strongly toward the plaintiff than the defendant.)

Lower courts have decided that simply linking to a defamatory web page does not constitute defamation, unless the link is provided for the purpose of endorsing the defamatory material, in which case it is the endorsement of the link that constitutes defamation, and not the link itself.

The problem in Canada, as in the US, is that governments have not kept up with legislation governing the legality of various internet-specific activities, like hyperlinking and so on. That has left the courts to try and decide through precedent how to handle these conflicts.

[1] http://www.scc-csc.gc.ca/case-dossier/cms-sgd/sum-som-eng.as...


You might care to read the extensive discussion from when this was posted 11 months ago:

http://news.ycombinator.com/item?id=1243159


I've added a post-script to this story, updating with developments over the last year: http://petewarden.typepad.com/searchbrowser/2011/03/facebook... In particular, I know from my friends in the academic community that they're quietly putting together processes for working with researchers. That's a big step forward in my view, as long as they can safeguard privacy, there's a lot of potential for world-improving research.


Great article. Is this the same person that Palantir mentioned as a potential source of Facebook information for social engineering attacks?

From the leaked HBGary emails:

"The Palantir employee noted that a researcher had used similar tools to violate Facebook's acceptable use policy on data scraping, 'resulting in a lawsuit when he crawled most of Facebook's social graph to build some statistics. I'd be worried about doing the same. (I'd ask him for his Facebook data—he's a fan of Palantir—but he's already deleted it.)'"

http://arstechnica.com/tech-policy/news/2011/02/black-ops-ho...


I notice they have updated their robots.txt to only allow user agents they have approved.

http://www.facebook.com/apps/site_scraping_tos.php


I noticed that a while back when it had the effect of removing thefacebook.com from the Wayback Machine. There used to be some fascinating reading in their old TOS and Privacy Policy. Wish I'd kept a copy.


What would have happened if he had done it from a company based in the Seychelles for example? Would that be a way to protect against Facebook aggressively suing with no grounds?


My guess Facebook would still sure and obtain an injunction on the site being distributed in the US, probably shutting down access in the US.


But how can they do this if the dns server and server are not hosted in the US?

I've been wondering about this problem too recently when looking at some frivolous patent lawsuits... The problem is that for quite a few case even if the law is on the side of the startup, the cost of applying the law and winning the lawsuit are too high...


If your domain name is registered though a US company they may be able to seize it.

Also, if you have any assets in the US they can be seized.


[2010]

I thought it sounded familiar.


Also, the title is misleading: he got threatened with legal action, not sued.


So...anyone have a mirror of the data?


google for "fbnames".


This was, in fact, tested (to a limited extent) in court about a decade ago. See eBay v. Bidder's Edge, 100 F.Supp.2d 1058 (N.D. Cal. 2000).

Short story: Back in the days when there was actual competition in the online auction market (anyone remember Yahoo! Auctions?), Bidder's Edge was crawling eBay listings to index them for an auction search engine. (I worked for one of their competitors.) eBay sued on a trespass theory, and was granted a preliminary injunction because the judge held that eBay was likely to succeed on the merits of the claim.

Unfortunately, the trespass claim was never fully litigated; Bidder's Edge agreed to stop crawling after the PI was granted.


Reminds me how facebook was almost suing suicidemachine.org [1] just because they allowed people to commit online suicide from facebook (unfriend everyone and set random password).

For me, facebook is just another bigheaded company, that is trying to turn your social life into their product [2]. And that is not the place, where I want to hang out with friends online. (And I dont.)

[1] http://suicidemachine.org/download/Web_2.0_Suicide_Machine.p...

[2] http://twitter.com/#!/librarythingtim/status/13226541303


Someone convince me what facebook said here was wrong. I don't think robots.txt gives you a license to do whatever you want with web content. If it did wouldn't robots.txt effectively put everything into the public domain?


Google can mine the data and do whatever they want (and I don't doubt for a second that they run analysis on it), but this guy can't?

Facebook wants to have their cake and eat it to. They want free Google publicity but god forbid some dude starts downloading pages for research. It's legal, but it's wrong.


I'm really only interested in the legal question. And I genuinely would like to be convinced that the legal system would allow scraping like this. I just don't see it.


Gathering information from a website is very different than publishing it verbatim.

I am not a lawyer, but I think it comes down to website Terms of Service enforceability. I don't know what the precedents are, but I would guess that a TOS that went against the nature of how the web is reasonably expected to operate would not stand up in court.

I don't think it's a copyright issue; facts are not copyrightable and by using the data in his research he's not using their presentation of the data.


Thanks for that, it at least convinces me that its not cut and dried. I think you have a good point about the copyright aspects particularly.


Well, you never got sued just threatened.


As a founder of a new company and the son of a lawyer lawsuits are certainly something I think about. It seems all companies that become well known eventually face lawsuits. While it sucks and you never want to face one, many know it is a cost of doing business. You also find people who want to attack a company seeing a big dollar sign in front of them. Plus lawyers might earn hundreds of millions or dare I say billions if they win a case from a company like Facebook or Google.


When the Twitter strategy docs got leaked a while back, there was a specific section that dealt with potential lawsuits.

http://techcrunch.com/2009/07/16/twitters-internal-strategy-... (see Defensive Strategy section)

Legal

- We will be sued for patent infringement, repeatedly and often

- Should we get a great patent attorney to proactively go after these patents (We need to talk about this more, we are unsatisfied)


...but those millions may take years to win. Every lawsuit has an opportunity cost. Yes if you get to a certain level that is the cost of doing business, but if you are just out of the gate and have X dollars and Y hours in the day that may be a poor investment. Also there's a difference between a disruptive cost in upsetting an old industry (say a YouTube or Napster) vs. fighting a single companies' walled garden. At the end of the day building a company on top of another companies walled garden is a high risk to begin with, so it could be better to just move on.


The world really could use better analytics tools for Facebook apps since the ones that Facebook provides are a little sorry in my opinion.


Sorry but I side with Facebook, a freely available public graph of millions of users could have been used for re-identification attacks.

Frankly you should never share your friends list publicly.


I'd bet money that a similar list is probably already floating around. Whether it's freely available or not may be up for interpretation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: