Hacker News new | past | comments | ask | show | jobs | submit login
Google has also indexed thousands of publicly accessible Panasonic webcams (tlrobinson.net)
183 points by tlrobinson on Jan 25, 2013 | hide | past | favorite | 59 comments



ok, i just googled

https://www.google.com/search?q=inurl%3A%22viewerframe%3Fmod...

10 600 results

when you click on one result

i.e.: 202.212.193.26:555/CgiStart?page=Single&Mode=Motion&Language=0

then you see in the head of the frameset (and similar in every framed html document)

  <META NAME="robots" CONTENT="none">
  <META NAME="robots" CONTENT="noindex,nofollow">
  <META NAME="robots" CONTENT="noarchive">
so basically, these HTML abominations should not get indexed if google would follow these indexing directives (basically google invented these meta tags themselves)

google is evil? nope - they really follow these directives.

so why is this indexed?

take a look at

http://202.212.193.26:555/robots.txt

  User-Agent: * 
  Disallow: /
the robots.txt is a crawling directive, google can't crawl the (current) version of these pages, so google doesn't see the indexing directive. but as crawling is optional for indexing URLs, this gets indexed.

how could this be solved, well: either get rid of the robots.txt or

  User-Agent: * 
  Disallow: /
  Noindex: /
the noindex robots.txt directive is specified nowhere, but it works nonetheless.


Can you elaborate on how crawling is optional for indexing? Isn't crawling a prerequisite to indexing?

The only exceptions I can think of are scary, like operating a caching proxy and scraping the cached data. Or scraping data from browsers that have loaded pages by user request.


You can discover a URL through finding a link to it on a publicly-accessible web page, even if crawling that link itself is not possible.


Ohh, got it, thank you. So Google is aware that the URL exists, even though they know nothing about the content served at that URL.

I am just surprised that a URL with no associated content would be included in the index.

But now that I think about it more, why not? It will not show up except in extremely specific searches, and in those cases it is useful to the searcher.


I find this behavior annoying. Here's why:

https://www.google.com/search?q=unicorn+admin

4th result down (wbpreview.com) is shown in search results despite blocking crawling/indexing with robots.txt. The result displays "A description for this result is not available because of this site's robots.txt – learn more" and the title seems to be auto-generated. The goal was to de-index the listing but apparently that's not an option.


As franze pointed out, you can specify not to index in robots.txt (I have not confirmed this). The intent of dissalowing crawling is ambigous. Maybe they do not want their content cached, or the extra load on their server, or any number of reasons. If you need to de-index a site, you should use the robots.txt directive. If it has already been indexed and you need it de-indexed quickly, google offers tools to do so [1]

[1] http://support.google.com/webmasters/bin/answer.py?hl=en&...


Thank you for pointing that out to me.


The way to prevent a site from being indexed at all is through a <meta name="robots" content="noindex,nofollow"> tag on the page or X-Robots-Tag HTTP header (both of which, ironically, require that you not robots.txt it out, because otherwise the page content will never be crawled), or through a Noindex directive in robots.txt (which is unspecified by the spec - Google supports it, but Yahoo and Bing don't).


"ok, i just googled [...] 10 600 results"

Most of which, to be fair, seem to be descriptions of this exploit, or pages listing open cams. The number of cams actually accessible is a fraction of those, and the number unintentionally left open a smaller fraction again.


I haven't updated the list of cameras in a few years, and seem to have lost my script to do it. I'll see what I can do later.


if you can find those cams via a google search, doesn't that mean they are linked from some other public site (which has also been indexed) which would indicate they were left public intentionally (at least most of them)?

If I would set up a web-accessible cam without password protection, how would google find it? Its a crawler right? It doesnt just searches for random IPs and tries to connect to them.

I always was under the assumption that there is a pretty big part of the internet which is just not indexed by the major search engines (thus more or less private).


Google is amazingly good at digging up sites out of nowhere. I wonder if it is a combination of URLs passing through Chrome, GMail, any android phone, and so on. It's always a hassle keeping staging/dev sites out of the index if you're not careful with all the right meta noindex and robots.txt tags. (robots.txt with disallow all, on its own, won't keep sites/URLs from showing up in the results, at best just hide the cached body text summary below the link)


> I wonder if it is a combination of URLs passing through Chrome, GMail, any android phone, and so on.

That would be incredibly alarming, and quite possibly the largest breach of trust perpetrated by a company so far this decade.


While having the Bing toolbar installed in IE, any URL one types or visits is submitted to Microsoft, and they actively use this data to tune their Bing search results.

http://www.wired.com/business/2011/02/bing-copies-google/

I agree it'd be alarming and terrible, but hardly a new development.

Edit: it's doubtful that an e-mail provider would automatically fetch links from e-mails -- think about them clicking 'unsubscribe' links and links to reject the transfer of domain names. It would break in very obvious ways. IMs and texts, on the other hand, might be more opaque to that kind of meddling.


It'd be interesting to set up a wildcard dns *.some-experiment.example.com, and send various http://links-via-gmail.some-experiment.example.com/somepath , http://links-via-skype.some-experiment.example.com/anotherpa... through a bunch of services, and see which domain names and which full URLs show up in the logs!


That's a really good idea.

I see it getting quite complicated, though! Dimensions I see are: User's OS, User-agent, ISP/Cell carrier, Transmission protocol (smtp, xmpp, http), service provider (google, microsoft/skype, microsoft/msn).

    android.att.xmpp-gtalk.example.com
    android.verizon.http-gtalk.example.com
    win8.verizon-fios.https-gtalk.example.com
    ios.sprint.skype.example.com
Then you might have to also include the sender AND receiver information in the domain, so based on a single request you could see all possible implicated parties.

I also thought about putting the sender in the path of the URI, but I think it should be in the domain name, too. This is because you might get a hit on robots.txt and in that case, you'd only have one half of the route in the domain name.

Finally, including everything in the DNS lets you evaluate whether the name was even resolved, and potentially by whom. Getting a hit that the name was resolved but not fetched over HTTP gives you information about which services might be analyzing links in order to queue them for further investigation.


Good call on logging DNS, that'd be a very nice early indicator even if no HTTP requests are sent!

I think maybe the domain should be of the format "www.encodedonlywithatoz.yourdomain.com" to maximize whatever regex parsers try to pick up on URLs (i.e, a www. prefix, a .com suffix, and no special chars). You could encode the dimensions via a lookup table to make it less verbose and slightly more obfuscated ("aa" = at&t, "ab" = verizon, etc).

You shouldn't expect data in the path info to be preserved, but it'd be a nice bonus, as you say.

Even more interesting would be some custom DNS software that replies with perhaps a CNAME or something, where you could encode a unique serial number per request. If you had a huge IP range available, you could even resolve to unique IP addresses for every lookup, so you could correlate DNS requests with any HTTP requests that show up later on. A low/near-zero DNS TTL would come in handy.


I like the idea of encoding the data. Or it can be like a URL shortener, where the metadata gets recorded, and a short hash is generated. It complicates the back-end but allows for more comprehensive data storage, and eventual reporting.

Regarding custom DNS software, I might draw from this excellent write-up featured on HN recently:

http://5f5.org/ruminations/dns-debugging-over-http.html


Nice find!

Also, it'd be interesting to just crank the log level to maximum on a normal piece of DNS software, and post some links around in IM clients and elsewhere, just to see if anything anywhere kicks in. The experiment could be repeated (on different subdomains) with a more clever implementation tricks later.


I ended up just setting up bind with a wildcard entry, and setting its log level for queries to debug. It is working now, but I need to build a little web app to generate the unique links. Also only one DNS server is running at the moment.

I can't wait to send some around in facebook messages and IMs.

Here's a maiden honeypot link: http://hn0001.hnypot.info/Welcome-Internets!

...Though posting it publicly nearly guarantees I will see a hit, I can at least see if code running on HN resolves it immediately.


I ended up just setting up bind with a wildcard entry, and setting its log level for queries to debug. It is working now, but I need to build a little web app to generate the unique links. Also only one DNS server is running at the moment.

I can't wait to send some around in facebook messages and IMs.

Here's a maiden honeypot link: http://hn0001.hnypot.info/Welcome-Internets!

...Though posting it publicly nearly guarantees I will see a hit, I can at least see if code running on HN resolves it immediately.

Edit: There is activity coming in on that name, but mostly it is from browsers pre-loading DNS to prepare for the next potential pageview. My browser did this (chrome on Mac). I suppose that is a form of information disclosure we often overlook. On a page you can inject a link into, you can get some very basic analytics.

In the 15 minutes following the posting of that link, there have been zero clicks, 36 IPv4 lookups, 6 IPv6 lookups.


Go for it! That sounds like a great idea.

Make sure that your results can be tracked and provide as much information as possible and you got a nice project here..


Maybe I will, but if anyone else feels like putting in the effort, go ahead, too :)


I just registered hnypot [dot] info for a few bucks and will see if I can get wildcard DNS running with some tracking. Haha, I don't want to type the name as a link until I get the tracking going...

If anybody wants to collaborate or you just want an NS delegation off that name to try to roll your own, just let me know!


It's possible(even likely) that what you typed is enough for a crawler to try that site.


It is a short name, so it's likely that it'll be found. But the real honeypot would be the large hashed subdomains that you would use as bait.

I don't think the main site or its www subdomain would need to be secret. Of course, if it uncovers some huge invasion of privacy, we might have to set up an army of different domains running similar software on separate IPs to keep it effective.


When you type into the chrome omnibox, it sends your keystrokes to Google to give you search suggestions. Just using those would be sufficient, and I don't think they really hide that it's sending your input along. If someone does a Google search for a URL, we expect it to get added to the index; why is it different when the search occurs in the omnibox rather than their web interface?


URLs don't go through google when you type them in Chrome's URL box, they go directly to the address. Chrome only sends things to Google that it can't interpret as a URL.


Actually it looks like URLs do go to google's suggestion service as you type into Chrome's URL box.

I tried typing "http://then and it suggested a UPS package URL and "thenicestplaceontheinter.net". I've not been to either of those pages before (I use Chrome for testing, so I'm not signed into it, etc).


I can tell you assertively that, if you capture packets while using Google Chrome's address bar for URLs, it does not send the data to Google.


Good point, come to think of it, the last time I had to deal with this, it could have been caused by a number of entries into google, such as embedding google analytics .js even on the staging site, or running a test on the google page speed tools, etc. But they certainly have a HUGE amount of opportunity to snap up new URLs across all their services.


Some IRC networks or channels make their logs publicly accessible - so if you're showing off your parakeet on camera one day, that URL might be logged.


People sometimes leave their site logs public, which I suspect means google could find links to private pages in the referer logs.

(Of course, people should be relying on more than obscurity to hide their private pages! - but I believe people have been known to make mistakes and/or not know what they're doing.)


So this has actually been a pretty common 4chan prank for a while now. People give you instructions for Googling webcam IP addresses (very typically just an IP string or something similar) and then try to find something worth sharing. Just like the printer post above, it's incredible how many webcams are left completely unsecured.


I'm pretty confident this actually predates the founding of 4chan. I mean, it's not rocket science: if there is a device that can be administered via the web, then you can probably find at least a couple unsecured via a web search.


What you wouldn't have though is a large group of people dedicated to actively monitor what is being captured.


Don't be silly. USENET? IRC?


>Just like the printer post above, it's incredible how many webcams are left completely unsecured.

My guess is that making a webcam publicly accessible is more likely to be intentional than doing the same with a printer.

Of course, if the webcam is for security, then probably not. But people installing security cams should be expected to know what they are doing (LOL!).


Also on German krautchan. Watching some japanse gas station attendant receiving weird stuff via fax was hilarious.


I know I probably shouldn't ask, but did you tape it?


This is a similar website that gathered webcam data using Shodan (http://www.shodanhq.com):

http://cryptogasm.com/webcams/


lol, someone's taking an exam http://cryptogasm.com/webcams/webcam.php?id=36739

awesome resource, connects you to the world in a weird way :)



Heh, that's an IP from Chile. Looks like some sort of outdoors classroom or something.


Looks like Heisenberg is ready to cook:

http://207.68.47.143:8080/anony/mjpg.cgi


I couldn't agree more


In school I had done a little research project to see if there were opportunities to map out these cameras and use them for disaster response scenarios. It seemed like a promising approach but I don't believe my prof ever took it past my proof of concept.

We were even able to track down some cameras located on campus which made for some hilarious phone calls.


How?


Clearly wasn't made for the frontpage of HN, it'd be nicer if it polled them server side, cached them and then reserved it up refreshing it every so often for the previews.


This was news about 6 years ago.


So many wasted IPv4s...


While we are on the topic of Google indexing things and revealing security holes I think that VoIP devices should also be mentioned.

I remember when I was taking a network security class in college the professor was guiding us through the steps required to scan a network for vulnerabilities, specifically detecting services and control panels which are left open and vulnerable. Naturally we were using the college network for this, and in addition to the expected control panels of printers in different professors' offices I accidentally found the control panel for the school VoIP system, and it was not properly secured. I believe it was a Cisco system. Anyway the control panel seemed to offer access to modify various settings of the college VoIP phone system, with no password protection.

Now granted it could be that I only had access to this because I was doing the scan from "inside the system" instead of outside via the web, but I'm sure there are vulnerable VoIP systems which have accidentally exposed their control panels to the internet.


>Now granted it could be that I only had access to this because I was doing the scan from "inside the system" instead of outside via the web

If 'inside the system' means from a University internet connection, that it is very much a security hole, as anyone physiclly present could exploit it. (Or at best any person who the Univeristy allows on their network, which is much larger than the group of people who should be able to touch those settings)


So anyone want to take on the challenge of building a "Person of Interest" like system and hook it up to these publicly accessible cams, it would obviously be less early-warning as its fictionally one. Maybe I would try it but I assume the computing resources needed would be large, but wouldn’t a bot-net solve that. Or maybe my first brush of ML experience has left me naive of its capability’s.

Of course a 'Global Citizen Operative' would have to take action also, it is not like I proposed a "Global responsive network of autonomous drone to enforce peace and harmony" to would do that instead.

Sorry for the up in the clouds comment but the merciless hand of insomnia grabbed me and prompted my mind to wander.


For fun, I used to troll open webcams in Japan. The best were the ones you can pan around and zoom in and out of. Lots of nice vistas, sea ports, city scenes. I'd put them on in the background and provide some visual "noise" during the day.


But those are intended to be public, are they not?


I don't think so. They seem more like security cams or similar. Plus being able to pan and zoom them around makes me think they really aren't meant for more than the person who set them up.


This might raise a few eyebrows...

http://bit.ly/V4VtJJ


There is actually a crawler for "Machines and Devices", such as routers, IP Phones, WebCams, Dell Dracs, HP ILO , VMWARE ESX and so on.

http://www.shodanhq.com/browse

Also, check your server ip on Shodan to see if your firewall rules are not exposing a little to much




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: