I contacted a company to buy Sex Offender data. I called twice and then sent 3 emails. After a month of no replies I wrote a program that screen scraped their site.
A great case is the UK Post Code system (which is much more specific than the US zip code system here - no more than 15 letter-boxes share a unique post code in UK).
A number of people have created Post Code to lat/long conversion tables. They're all facts but because they have been worked out they are copyrightable. To stop copying the providers introduce slight variances in the data to track duplication and usually jump on top of it.
An even more unusual UK example would be the "copyright" claimed over the football fixture lists you'll find on any popular sport site - they're supposed to pay a large licensing fee to a company called Football Dataco even if they source the data independently.
The scraping is generally not illegal, though it may violate ToSes. It's just that people who scrape generally republish, which may or may not be illegal.
"Ethics" in cases like this are so subjective that it's almost useless debating them.
What if I never read or even open the ToS and proceed directly to screen scraping your site?
Afaik there is no law that mandates you must read/agree to the ToS before loading a website. In fact, what if the ToS states 'If you visit this site, you must pay us 1000 dollars', are you now liable? Despite the fact that just to read the ToS you had to visit said site.
My point is that only a court will decide what happens, and just because something is in the ToS doesn't mean it gives you carte blanch to do whatever you please.
Sure, ToS are crap - they're not contracts and any 'extra rights' the site gets are indeed not worth the electrons they're printed on. But copywritten content is protected by good-old actual law.
Actually, I'm very ethical. People around me tell me I take it to absurd level sometimes.
It's funny how public perception of this can be driven by folks who have never met you or done business with you. You would be hard pressed to find anyone in business, an investor, a partner, etc. that would ever say anything to the contrary. Thus the reason I'm able to raise money, partner, sell companies, etc. over and over again despite the fact that i get negative comments/blog posts from haters.
There is reality, and there are comment threads. :-)
So you are saying that each search result page are handwritten by Google editors?
Of course they are not, they are simply re-published snippets of the websites along and Google surrounds the results with ads.
While it's a symbiotic relationship that most websites want - sharing their content for placement in Google's webpages - it's not necessarily universal and ROBOTS.TXT is hardly a "contract" covering your data's usage.
Resist metaphors for this sort of thing. Resist them strongly. There are no physical analogues for the internet that are close enough to matter. When I scan a book page, the author and publisher are so untroubled they don't even know, indeed, they can't know. When I look at a web page, they have to actually serve it to me, costing them some amount of money. Money you can only call trivial when you aren't scraping pages by the thousands. This is enough to render your metaphor irrelevant by virtue of significant, both in the sense of "sizable" and in the sense of "relevant", difference.
There's no physical analogue to scraping a database-driven site. The question of exactly how legal it is probably won't be settled until that fact penetrates the legal system; until then it depends on what metaphor you sell to a given judge.
but you left out the part of my metaphor that included selling the scanned book later on. That is where the issue can be become public. That is where you're affecting the livelihood of said authors/publishers.
I'm really playing devil's advocate here, I do very large amounts of scraping online for various projects and do not bat an eye lash at what I'm doing. If it's online, it's there for the taking. If you don't want me to scrape it, hide it.
That doesn't save it, though. You misinterpret me as doing precisely what I'm advocating against; I'm not making an argument by metaphor. I'm simply observing yours doesn't work. It only has to not work in one way for it to be invalid. That doesn't make your conclusion wrong, it just makes your argument invalid.
In fact I agree with you that it is broadly speaking incumbent upon someone who does not want to be scraped to have at least some protections technically enforced, however feebly, and you should not go out of your way to violate such protections however feeble they may be. But I come to that conclusion thinking about the monetary issues and bandwidth issues and ethical issues directly, not by making a bad analogy to people leaving doors open or locks on gates in the middle of the field or houses constructed out of glass.
no, despite the rhetoric you employed here, you're still skirting my rebuttal and lacking any real point other than to show off how many words you can use to describe absolutely nothing.
My point is that someone works to create something that is freely available to peruse (website content = book at library), and anyone who comes in and copies and sells that content will make said author upset.