The web makes things interesting in that OP's hypothetical company only has that data because Facebook willingly gives it to everyone who asks. It would be obviously wrong if they were using some exploit to trick Facebook's servers into divulging secrets.
Yeah nah, that's where the concept of agreements comes in. You walk up to Fes Boock and say:
― I want to have business with Fes Boock.
― Fes Boock will have business with you if you promise to not stab Fes Boock in the back.
― I give my word to not stab Fes Boock in the back.
Turns out, this thing is so valuable, it's supported by law everywhere that I know of, in multiple forms, including rather implicit ones such as “ToS.” Which is what allows Fes to sue the stabbing bastard.
To my knowledge making an HTTP GET request and then receiving a document does not involve agreeing to any TOS, implicitly or otherwise. If the server didn’t want to send the data over an authenticated channel, then why does it send the data?
My mailbox opens and closes for my mailman to collect outgoing mail and deposit incoming mail. But anyone can open it. That doesn't mean I want them to, or that they are allowed to. But if my mailbox doesn't want to allow access to private information, then why does it open for unauthorized individuals? Because physically securing it would be a pain in the ass, most people are honest, and if I can keep my mail safe through force of law and social contract, that's easier for everyone, including legitimate users of my mailbox (myself and my mailman).
Your argument holds for mailboxes because it is not a common use case of mailboxes that their owners want complete strangers to check as often as possible because they've left something they want taken.
A better real world analogy is a bulletin board on campus or a wooden power pole.
Lets suppose that it is super common that people staple flyers to power poles, with the expectation that people will read them as they pass by. Your analogy would claim that if I staple a letter to the power pole, expecting that only my friend that I told about the letter should read it, that passers-by are doing something unseemly by reading it, while being surrounded by want ads and for sale flyers that people do want read.
Websites are nothing like mailboxes. The vast majority of websites would prefer that as many people as possible read their contents as much as possible. Email would be a better analogy.
A request is communication with certain semantic content, which pulling on a mailbox handle lacks. There is no general understanding among people nor specific agreement between you and some other party that pulling on your mailbox handle is how to ask you for access to your correspondence.
This is not the case for HTTP. A network protocol is an agreement about the meaning of certain clusters of bytes sent over a network. When someone operates an HTTP server, a reasonable person could conclude that they take HTTP messages to mean what HTTP says they mean. A lot of cases get more interesting because there is also something generally understood to mean, "Please don't access the following resources by automated scraping, independently of whether my server decides to grant those requests."
I'm pretty sure that a server, being a stupid piece of inanimate junk, is unable to enter any agreements or disagreements. In contrast, people, being endowed with free will supported by the ability to reason, need to apply said will and reason when directing actions of pieces of junk, so as to follow the same procedures of inter-party conduct as in direct interaction.
Since a web server, by its primary mode of operation, does indeed more or less indiscriminately send replies to whomever makes a request, it follows that the duty of choice lies with the client. The person operating the client has to apply their reason and follow the inter-party conduct.
> Since a web server, by its primary mode of operation, does indeed more or less indiscriminately send replies to whomever makes a request, it follows that the duty of choice lies with the client.
Sorry, why isn't it the duty of choice the server owner, who chooses to put the server online in the first place? What exactly are these rules you think exist? This is the first time I've ever heard of them.
> Since a web server, by its primary mode of operation, does indeed more or less indiscriminately send replies to whomever makes a request,
This is completely false. The server owner can authenticate GET requests and return an unauthorized response if the client is not permitted to access the document. We are not talking about a situation where a hacker attempts to brute force a password or gain unauthorized access to a server. If the server is on the internet serving anonymous GET requests with no authentication the reasonable assumption is that anyone is permitted to access the data.
Well, if you think that it would be more reasonable and expedient to require users to read a contract beforehand and then authenticate themselves to the service before accessing any content―please, knock yourself out on your site.
It appears that the rest of the web gets by pretty well using the legal framework I've described. Because, you know, they tend to choose things to be pragmatical instead of those that “can be done.”
Sure, but web scraping is a thing, and one that shouldn't be illegal. Therefore if data is public, it should be assumed to be... well, publicly accessible.
Do this. Create a fake company and say you wrote a spider to index Facebook public profile data and that you have like say 100GB ....
Watch how fast you get sued by Facebook.
Mind you this is public data that EVERYONE can see...