I contacted a company to buy Sex Offender data. I called twice and then sent 3 e...

lsb · on Aug 9, 2010

Remember, they can't copyright facts, no matter how much sweat of the brow it took to compile those facts. Recall Feist v Rural, wherein someone made another phone book: http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Tel...

dotBen · on Aug 9, 2010

In US this is true, in UK it isn't.

A great case is the UK Post Code system (which is much more specific than the US zip code system here - no more than 15 letter-boxes share a unique post code in UK).

A number of people have created Post Code to lat/long conversion tables. They're all facts but because they have been worked out they are copyrightable. To stop copying the providers introduce slight variances in the data to track duplication and usually jump on top of it.

I don't believe that would fly here in the USA

aristus · on Aug 10, 2010

So... buy a copy of all of then and throw away the diffs? :D

macca321 · on Aug 10, 2010

buying them all would defeat the purpose

notahacker · on Aug 10, 2010

An even more unusual UK example would be the "copyright" claimed over the football fixture lists you'll find on any popular sport site - they're supposed to pay a large licensing fee to a company called Football Dataco even if they source the data independently.

nailer · on Aug 9, 2010

Depends on the country - but yes for the US you're OK.

steveklabnik · on Aug 9, 2010

Is screen scraping really unethical? I mean, you've sent me a bunch of text... whatever I do with it is my business.

mattmaroon · on Aug 9, 2010

The scraping is generally not illegal, though it may violate ToSes. It's just that people who scrape generally republish, which may or may not be illegal.

"Ethics" in cases like this are so subjective that it's almost useless debating them.

vaksel · on Aug 9, 2010

it is to some people...a) you are breaking the ToS b) you are stealing their content to resell/republish it as your own

zepolen · on Aug 9, 2010

What if I never read or even open the ToS and proceed directly to screen scraping your site?

Afaik there is no law that mandates you must read/agree to the ToS before loading a website. In fact, what if the ToS states 'If you visit this site, you must pay us 1000 dollars', are you now liable? Despite the fact that just to read the ToS you had to visit said site.

My point is that only a court will decide what happens, and just because something is in the ToS doesn't mean it gives you carte blanch to do whatever you please.

nailer · on Aug 9, 2010

Sure, ToS are crap - they're not contracts and any 'extra rights' the site gets are indeed not worth the electrons they're printed on. But copywritten content is protected by good-old actual law.

cliveholloway · on Aug 10, 2010

Somebody grab Jason Calacanis for a meaningful soundbite :)

jasonmcalacanis · on Aug 10, 2010

Actually, I'm very ethical. People around me tell me I take it to absurd level sometimes.

It's funny how public perception of this can be driven by folks who have never met you or done business with you. You would be hard pressed to find anyone in business, an investor, a partner, etc. that would ever say anything to the contrary. Thus the reason I'm able to raise money, partner, sell companies, etc. over and over again despite the fact that i get negative comments/blog posts from haters.

There is reality, and there are comment threads. :-)

roboneal · on Aug 10, 2010

Every search engine is a screen scraper. Google has been making $$$ off of re-published content since day zero.

random42 · on Aug 10, 2010

No.

a.) Google follows robot.txt. You can disallow Google to index your website. Most of the websites, OTOH, want Google to index websites.

b.) Google does not republish the content. All the traffic is directed to the content owner, ie the other websites.

roboneal · on Aug 12, 2010

So you are saying that each search result page are handwritten by Google editors?

Of course they are not, they are simply re-published snippets of the websites along and Google surrounds the results with ads.

While it's a symbiotic relationship that most websites want - sharing their content for placement in Google's webpages - it's not necessarily universal and ROBOTS.TXT is hardly a "contract" covering your data's usage.

ddemchuk · on Aug 9, 2010

to the owners of the content, it's similar to scanning a book at a library then reselling it without their permission

jerf · on Aug 10, 2010

Resist metaphors for this sort of thing. Resist them strongly. There are no physical analogues for the internet that are close enough to matter. When I scan a book page, the author and publisher are so untroubled they don't even know, indeed, they can't know. When I look at a web page, they have to actually serve it to me, costing them some amount of money. Money you can only call trivial when you aren't scraping pages by the thousands. This is enough to render your metaphor irrelevant by virtue of significant, both in the sense of "sizable" and in the sense of "relevant", difference.

There's no physical analogue to scraping a database-driven site. The question of exactly how legal it is probably won't be settled until that fact penetrates the legal system; until then it depends on what metaphor you sell to a given judge.

ddemchuk · on Aug 10, 2010

but you left out the part of my metaphor that included selling the scanned book later on. That is where the issue can be become public. That is where you're affecting the livelihood of said authors/publishers.

I'm really playing devil's advocate here, I do very large amounts of scraping online for various projects and do not bat an eye lash at what I'm doing. If it's online, it's there for the taking. If you don't want me to scrape it, hide it.

jerf · on Aug 10, 2010

That doesn't save it, though. You misinterpret me as doing precisely what I'm advocating against; I'm not making an argument by metaphor. I'm simply observing yours doesn't work. It only has to not work in one way for it to be invalid. That doesn't make your conclusion wrong, it just makes your argument invalid.

In fact I agree with you that it is broadly speaking incumbent upon someone who does not want to be scraped to have at least some protections technically enforced, however feebly, and you should not go out of your way to violate such protections however feeble they may be. But I come to that conclusion thinking about the monetary issues and bandwidth issues and ethical issues directly, not by making a bad analogy to people leaving doors open or locks on gates in the middle of the field or houses constructed out of glass.

ddemchuk · on Aug 11, 2010

no, despite the rhetoric you employed here, you're still skirting my rebuttal and lacking any real point other than to show off how many words you can use to describe absolutely nothing.

My point is that someone works to create something that is freely available to peruse (website content = book at library), and anyone who comes in and copies and sells that content will make said author upset.