Hacker News new | past | comments | ask | show | jobs | submit login
This document is confidential (encrypted.google.com)
358 points by ssclafani on Aug 17, 2010 | hide | past | favorite | 44 comments



This really isn't that big of a deal. If you look through some of the results, you see things like blank forms (where "Confidential" refers to the information filled in by the applicant), boilerplate, attachments as part of public filings (where the document may have been confidential but came out in a trial) etc.

In one case, the "this document is confidential" is a phrase taken from the sentence "nothing in this document is confidential." I'm working on a "confidential" document right now, but it's intended for litigation and will likely show up in a records search in a year or two.

There may be something juicy in here, but you're gonna have to go through a whole bunch of the mundane to find it.


So much confidentiality, yet so little desire to actually read any of these documents. I hear Area 51 is cool, though.

Public is the new private - if you're hiding something, I'm not interested. If it's in plain view, I feel especially cool for seeing it.


Changing it to site:gov.uk doubles the number of results.


Replace confidential with one of:

PROTECT COMMERCIAL, PROTECT PERSONAL, RESTRICTED

You have to sort through a bit, but there's plenty there.


If you think this search gets plenty of results, try searching for "XYZ Confidential", where XYZ is a Fortune 500 corporation.



Most of these "confidential" documents are presentations made by google employees to other companies to sell adwords.

A peep into the documents reveal the huge investment Google does into Product Marketing Managers who do industry analysis, sell products predominantly adwords

http://static.googleusercontent.com/external_content/untrust...

http://google.inxshare.de/01_Automotive/Google_AutomotiveMar...

http://www.in.gov/tourism/pdfs/Compete-Google_Travel_Economy...

http://www.mednet-tech.com/pdf/MedNet%20Webinar_11%2018%2020...


Ha, you get even more interesting results if you change the domain to .mil.


What's the significance of "https://encrypted.google.com?



I haven't browsed it in about five years, but the Google Hack Database (http://www.hackersforcharity.org/ghdb/) had a lot of these kind of things.


What did surprise me was that when I changed it from gov to mil there were only 7 results. Expected more from the military. ;)


I think wikileaks host those that are missing in the .mil ...


No.... :) I hear fake 25 yr old female analyst profiles on Facebook work well. :)


Holy hell... snooping for classified data aside, searching for documents on any historically significant topic on site:mil may now be my new hobby.


Please don't confuse the US military with the US government. In the military if you can't do the job you're assigned, you get reassigned. If it doesn't do what it's supposed to do it gets axed. In the government...


... the FBI will accidentally declassify military communique on the reproduction of Tesla's experiments: http://bit.ly/aT5AZ0


Well, that was a waste of time.


try:

filetype:rtf | filetype:ppt | filetype:pptx | filetype:csv | filetype:xls | filetype:xlsx | filetype:docx | filetype:doc | filetype:pdf "this document is classified" site:mil

EDIT: Actually most of this stuff is false positives, but I'm sure you can fine tune the search query.

EDIT 2: This is quite good, for UK (I'm a UK Citizen):

filetype:rtf | filetype:ppt | filetype:pptx | filetype:csv | filetype:xls | filetype:xlsx | filetype:docx | filetype:doc | filetype:pdf "this document is confidential" site:gov.uk


Just visit Wikileaks...


Title of this item should be changed to "Eternal September has arrived."


Apple and Google turn up nothing, but Microsoft...

https://encrypted.google.com/search?hl=en&q=filetype%3Ar...


See also:

?intitle:index.of? mp3

# -FrontPage-" inurl:service.pwd

intitle:"Index of" config.php

Not to mention the credit card number hacks... :)


The MP3 one doesn't work nearly as well anymore. Spammers figured it out too.


They will CAPTCHA you if you go too far with SSN/CCN regexes.


Or just about anything involving complicated number-range searches. I got bot-blocked while looking for something a month or two ago.


intitle:"Index of" "config.inc" -"config.inc.php"

I'm confused to why people do this, surely the must see the implication - webservers generally don't parse files with a .inc extension as php so it ends up being displayed as raw text.


In a previous life, I worked on a site that had the password and everything for the database connection in a "config.inc". I changed it.


Haha. Who needs wikileaks?


Anyone who wants to read stuff more interesting than water permits being issued in Montana or business fluff like "Developing a Transparent Business Case that builds true Accountability" :-)


oh noes, you must be a hacker!

On a different note, this is not surprising in the least (although quite clever ;).



Really interesting stuff -- thanks for reposting here.


And this is data that managed to seeped through the cracks of Google's scrubbing. I would imagine the raw index from spiders are far more interesting.


It sounds asinine since it requires the creator of documents to make the mistake in the first place, but is there possibly a useful service to be made for detecting sensitive documents of a company in a public location (public URI location, not physical location)? Is there nothing more useful than a google search? Just brainstorming.


take out 'this document is confidential', add SSN. a lot of false hits, but some genuine government employee records too. also the bush era whitehouse visitors logs which are just kinda interesting to skim through


are there any search engines out there that ignore robots.txt, that this could be done with?


Drop the site:gov from the query for some more interesting results.


...What's Wikileaks, again?


this of course led me to search for TOP SECRET:

http://www.google.com/#hl=en&source=hp&q=filetype%3A...


which of course led me to a search for "EYES ONLY"

https://encrypted.google.com/search?num=100&hl=en&sa...


you could try NOFORN


http://www.cabinetoffice.gov.uk/media/cabinetoffice/corp/ass...

probably it is officially released though, since it is /media/.../asset


The path contains the folder foi (freedom of information) so someone will have made a request for the document and they are obliged to release it to them if it meets the criteria.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: