At a previous SaaS startup I worked for, we stored a bunch of clients' documents somewhere on e.g. example.com/documents - which included confidential items like checks and contracts. Customers would typically opt-out of requiring authentication for a lot of these documents so they could share them with others.
I noticed that when you searched for the company on Bing, you would actually see a bunch of these documents, despite nothing linking to it!
Of course I updated the robots.txt and yelled at leadership for the gaping security hole, but I was very surprised to see that Microsoft would send every link you crawled back to Bing to index. Distributed web crawling!
>Of course I updated the robots.txt and yelled at leadership for the gaping security hole
I've curious how there was a security hole when a client opted out of requiring auth? If the client wants them publically available then there was no security hole.
Secondly, I am wondering why you are yelling at leadership for something that seems like it was your responsibility.
the decisions on which basis the user opted out from auth was that they believed the links where obscure/private enough to be "non discoverable" (enough)
for example let's say your link is `example.invalid/documents/samcea45pwmcwwn325ewaruvon4pepwrm8euwawuvuer8u` and there is no non-authenticated index/listing available
under normal circumstances you could argue that this long id is comparable to a "simple" shared password i.e. knowing is a very weak form of authentication, except it doesn't have the same degree of protection wrt. storage, logs etc. But good enough for non-public non-secret data.
that is until you browsers without you knowing it or explicitly agreeing to it starts creating that index which shouldn't exist _and_ pushes it to a search machine...
(or a you have a virus infection which installs a link scrapper , now that I think about it edge pretty much acts like a virus in this case, lol)
Similar such links are all the time used for account setup or password reset, too.
There is nothing wrong with them, and intentional mall ware would likely be able to scrap whatever you additionally add to secure a shared link without password.
There is _a lot_ wrong with what edge is doing.
If edge would be hardware it would need to be destroyed in some countries because it counts as unauthorized spying device (but that law was never updated to the digital age).
The "security hole" was anyone could use search inurl:example.com/documents and get five pages of results with SSNs, credit card numbers, etc. plus the reputational damage of leads doing any amount of research into the company and seeing confidential documents on page 2 or 3 by literally just searching the company name. The startup was big enough that the data&reputational risk was easily 7 figures/yr
> I've curious how there was a security hole when a client opted out of requiring auth? If the client wants them publically available then there was no security hole.
Its possible they wanted the link to be easier to share with very specific people, but not necessarily be something found on bing.
Not really, no. That came about more from people claiming to have good security, but not disclosing their security practices and many of them turning out to be rather insecure.
Many products (Google Docs, Youtube, Office 365, Dropbox, etc) allow sharing things via unguessable URLs; it's a standard practice that was safe, until browsers and browser extensions decided it was okay to send private URLs to other parties.
I would not be surprised if the EU steps in at some point and fines them heavily for it.
There are a large number of services (Dropbox, Youtube, Google Docs, Office 365, etc) that use unguessable URLs for sharing and hence clearly don't share your idea of what it means to be "public" on the internet.
Oddly Bing seems to like PDFs more than other pages. I have a project that generates PDFs for logged in users. I see a handful of Bing/MSN IP addresses that keep on trying to view that page everyday, but MS has never tried to index other pages that need user authentication. So either Microsoft is really eager to index PDFs or my code the logs unauthenticated access attempts doesn't work on other pages.
I noticed that when you searched for the company on Bing, you would actually see a bunch of these documents, despite nothing linking to it!
Of course I updated the robots.txt and yelled at leadership for the gaping security hole, but I was very surprised to see that Microsoft would send every link you crawled back to Bing to index. Distributed web crawling!