Hacker News new | past | comments | ask | show | jobs | submit login
Diagnostic page for Google.com (google.com)
410 points by dushyant on Jan 23, 2015 | hide | past | favorite | 70 comments



Few honest questions:

1. If Google detects something as malware, i.e. google software knows that it can be dangerous to users, then why it cannot prevent itself from acting as intermediary? Also, why it does not stop hosting malware?

2. >>> Malicious software is hosted on 279 domain(s), including 24corp-shop.com/, abu-farhan.com/, soaksoak.ru/.

These web domains do not belong to Google. It seems google is downloading several pages onto its server for various purposes. Is it legal in all countries?

From the architecture point of view, is it difficult to sandbox/protect user facing google.com search engine from the above websites all the time so that if malware is there, do not let it effect search engine or other major parts. Users are not security-literate.

3. What should I do as user? Just ignore this assuming that this is for webmasters and not for ordinary users?

Honestly, for me personally, malware on google is unimaginable, since we consider it as gold standard on the web.


It's important for us (I work at Google on web-search) to be transparent about these reports, and we use them to remove / block content that is malicious too (just like other sites can use the Safe-Browsing API to get information about sites they host). With regards to where it's hosted, there are two main elements involved: a site that actually hosts the exploit (which could be a Windows EXE file, etc), and a site that sends the user to that exploit. Often these are separate. Sometimes it's not even a direct embedding of a known malicious site, for example, it could be that a counter/analytics-tracking site is hacked, which could result in all other sites that use those counters/scripts unknowningly sending users to malicious content.

From talking with webmasters, I have seen almost no false-positives in this flagging, but it's sometimes very hard to find the actual exploit. It sometimes hides from some visitors (direct visitors - like the webmaster - might not see it, it might only be visible for those coming from search), sometimes is limited to geographies or devices. This makes finding the exploit hard sometimes, and fixing the website so that it's no longer vulnerable to the attack that dropped the exploit isn't easy in many cases either.

I take these warnings very seriously when I see them in the browser, even when accessing a site with a fairly locked-down & up-to-date browser. I would recommend never skipping them, even to diagnose an issue (use other tools for that).


Hi there! Why so many security reports on blogspot? http://www.google.com/safebrowsing/diagnostic?site=blogspot....


My theory: People search for stuff on Google. The search results page has a result with a download from abu-farhan.com. People click that link on the search results page, the download starts. Now google.com has "hosted" a malware download.


> Also, why it does not stop hosting malware?

Keep in mind that this is not reporting malware is currently present, it is reporting that at the last time it checked it found malware, which may have been taken down since then. It doesn't tell you anything about how long it stayed up.

> These web domains do not belong to Google. It seems google is downloading several pages onto its server for various purposes.

I have no specific knowledge of this, but my guess would be that these are just the targets of links.



"Of the 153 pages we tested on the site over the past 90 days...."

That's a small sample.



http://prntscr.com/5wi90o just to prove google wrong


Would some kind soul please describe to me what this does, my corporate eager beaver network admins seem to consider this some kind of problem site and it's URL is blocked by our gateway proxy.


Indented and hard-wrapped, for your viewing pleasure:

    Safe Browsing Diagnostic page for google.com

    What is the current listing status for google.com?

        This site is not currently listed as suspicious.

        Part of this site was listed for suspicious activity 12 time(s) over the
        past 90 days.

    What happened when Google visited this site?

        Of the 6815255 pages we tested on the site over the past 90 days, 1686
        page(s) resulted in malicious software being downloaded and installed
        without user consent. The last time Google visited this site was on
        2015-01-22, and the last time suspicious content was found on this site was
        on 2015-01-22.

        Malicious software includes 139894 exploit(s), 2748 trojan(s), 502 virus.
        Successful infection resulted in an average of 5 new process(es) on the
        target machine.

        Malicious software is hosted on 275 domain(s), including 24corp-shop.com/,
        abu-farhan.com/, soaksoak.ru/.

        296 domain(s) appear to be functioning as intermediaries for distributing
        malware to visitors of this site, including southeastasianarchaeology.com/,
        thesmallbusinessplaybook.com/, impots-economie.com/.

        This site was hosted on 3 network(s) including AS36040 (YOUTUBE), AS43515
        (YOUTUBE), AS15169 (GOOGLE).

    Has this site acted as an intermediary resulting in further distribution of malware?
        Over the past 90 days, google.com appeared to function as an intermediary
        for the infection of 528 site(s) including s3.amazonaws.com/lowlordyok/,
        s3.amazonaws.com/fann21ahsdc/, s3.amazonaws.com/skcfb01kpl/.

    Has this site hosted malware?

        Yes, this site has hosted malicious software over the past 90 days. It
        infected 22 domain(s), including burguscircus.free.fr/,
        plus.google.com/112502198606472559837/, beljews.info/.

    Next steps:
        Return to the previous page.
        If you are the owner of this web site, you can request a review of your
        site using Google Webmaster Tools. More information about the review
        process is available in Google's Webmaster Help Center.


Hand-wrapped? Please tell me a bookmarklet did that?! :-)


Use the vim.


Safe Browsing Diagnostic page for google.com

What is the current listing status for google.com?

This site is not currently listed as suspicious.

Part of this site was listed for suspicious activity 12 time(s) over the past 90 days.

What happened when Google visited this site?

Of the 6815255 pages we tested on the site over the past 90 days, 1686 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 2015-01-22, and the last time suspicious content was found on this site was on 2015-01-22.

Malicious software includes 139894 exploit(s), 2748 trojan(s), 502 virus. Successful infection resulted in an average of 5 new process(es) on the target machine.

Malicious software is hosted on 275 domain(s), including 24corp-shop.com/, abu-farhan.com/, soaksoak.ru/.

296 domain(s) appear to be functioning as intermediaries for distributing malware to visitors of this site, including southeastasianarchaeology.com/, thesmallbusinessplaybook.com/, impots-economie.com/.

This site was hosted on 3 network(s) including AS36040 (YOUTUBE), AS43515 (YOUTUBE), AS15169 (GOOGLE).

Has this site acted as an intermediary resulting in further distribution of malware?

Over the past 90 days, google.com appeared to function as an intermediary for the infection of 528 site(s) including s3.amazonaws.com/lowlordyok/, s3.amazonaws.com/fann21ahsdc/, s3.amazonaws.com/skcfb01kpl/.

Has this site hosted malware?

Yes, this site has hosted malicious software over the past 90 days. It infected 22 domain(s), including burguscircus.free.fr/, plus.google.com/112502198606472559837/, beljews.info/.

Next steps:

Return to the previous page. If you are the owner of this web site, you can request a review of your site using Google Webmaster Tools. More information about the review process is available in Google's Webmaster Help Center.


I'm not sure re-posting the site answers the OP's question.

This its likely the result of user generated content running on a google.com subdomain.


"Would some kind soul please describe to me what this does, my corporate eager beaver network admins seem to consider this some kind of problem site and it's URL is BLOCKED by our gateway proxy."


It's google eating its own tail and doing a 'malware report' on google.com.



Google Shares owned by GWEI: 819 Amount of USD: 405.413,19

Not bad at all!


Welp, that's fraud.


there was a pretty rowdy thread here that went down when this site was discovered. that was pretty much the end result /public opinion.


To be fair, it could be that your network admins are using web filtering defaults, which, by default, may be limiting your viewing of this page.

I had to frequently whitelist pages when using Websense and our ASA.


I just tried with a known, compromised site and that scanner said it was OK, so be cautious.


I have seen php malware distribution code that checks if it's a google originating address and doesn't display to the "user" if it is.


AS36040 (YOUTUBE), AS43515 (YOUTUBE), AS15169 (GOOGLE), AS54113 (FASTLY), AS36459 (GITHUB), AS16509 (AMAZON-02), AS14618 (AMAZON-AES), AS16509 (AMAZON-02), AS38895 (AMAZON-AS-AP) and so on.

Could someone tell me more about those network codes ?

Where do they come from ? Specifics to Google or following some standard ?


AS numbers is part of the BGP protocol, when you are a large organization with multiple presence points on the internet you need to advertize your prefixes (routes), i.e. the IP blocks that you host behind your routers, and to do that you need to be an "Autonomous System" and to be one you need to register with IANA (it costs $500 I think and you need to prove that you actually need one) and you get an AS number. The techincal details are here: https://tools.ietf.org/html/rfc4271


AS Numbers are how you announce the prefixes where content C resides e.g. 1.0.0.0/24 to the Internet, and you (through your provider announcing your prefix to the Internet) find your path to the content C for the services.

http://www.cidr-report.org/cgi-bin/as-report?as=AS15169&view...

Complexities aside - this is how your Internet works.


If I'm not mistaken it's this:

http://en.wikipedia.org/wiki/Autonomous_System_(Internet)

Basically if you get IPs from ARIN or another RIR:

http://en.wikipedia.org/wiki/Regional_Internet_registry

You get a name on your block.


Those are the [Autonomous System numbers](https://en.wikipedia.org/wiki/Autonomous_System_%28Internet%...). They are given out by IANA.


How can google.com be used to serve malicious content?


For example gmail is on google.com, so is google drive, google transalte (I think this might be a big one), and various other services that host user content.


And code.google.com

Someone briefly had a pointer to http://www.google.com/safebrowsing/diagnostic?site=code.goog..., which includes:

> Malicious software is hosted on 23 domain(s), including sms-bomber.googlecode.com/, gdata-issues.googlecode.com/, infojob.googlecode.com/.


Don't forget:

sites.google.com wiki.google.com apps.google.com

This probably scanned Google's old Sites product; the equivalent of Geocities for the early '00's.


Since a couple of years, google redirects you when you click on a hit URL via google.com/something (you only notice this on a slow connection like an EDGE/2G network).

It might very well be that the malware scanner picked up a link to such a "redirector" which leads to malware and then took the TLD google.com for malicious.

Another reason why one should never ever host user-generated files (or links/redirects) on the primary domain. Github did this with github.io for the same reason.


I wonder if malicious plugins are modifying the page and injecting things? I've encountered ones that physically change Google results pages, it looks like the page came from Google but the results are from some scam network.


Ads?


had the same question actually


The diagnostic page doesn't appear to always be strictly accurate. For instance, it says "Google has not visited this site within the past 90 days." for many of my sites which it has crawled daily for years.


I assume there are different levels of data collected for different types of visits. For example, Google may just collect data for the PageRank algorithm (i.e. your pages have been visited by google.com) or they may also collect safebrowsing/diagnostic data (i.e. your pages have been visited by google.com and http://www.google.com/safebrowsing/diagnostic).


http://www.google.com/safebrowsing/diagnostic?site=www.sourc...

"Of the 10 pages we tested on the site over the past 90 days, 0 page(s) resulted in malicious software being downloaded and installed without user consent."

What about bundleware fail?

http://httpshaming.tumblr.com/post/95068402386/filezilla-sou...


My personal site is literally safer than google by google's own tool.


Probably quite a bit less traffic and content too.


Just a smidgen. If you sign up, I would have 1 user. But it's safe.


Is this due to people with viruses/compromised routers going to their home screen (google) and having that redirected to some malicious page?


Some of it could actually be hosted through code.google.com


ah good point, though github.io does not seem to have that problem http://www.google.com/safebrowsing/diagnostic?site=github.io


repost from another sub-thread [1]...

Someone briefly had a pointer to http://www.google.com/safebrowsing/diagnostic?site=code.goog..., which includes:

> Malicious software is hosted on 23 domain(s), including sms-bomber.googlecode.com/, gdata-issues.googlecode.com/, infojob.googlecode.com/.

I would actually compare it to github.com and github.io http://www.google.com/safebrowsing/diagnostic?site=github.co...

[1] https://news.ycombinator.com/item?id=8934594


On the one hand, I feel smugly better about using another browser[1]. But how could I feel this without google (i.e. google.com/safebrowsing/diagnostic) to provide the ammunition? I'm so confused now.

[1] http://www.google.com/safebrowsing/diagnostic?site=duckduckg...


It says they visited 6,815,255 Google sites and 1,686 contained malware. It only visited 152 DuckDuckGo pages. It would have to visit an order of magnitude more than 152 Google pages in order to expect one of them to contain malware.


Another browser? You mean search engine?


Thank you for the correction. (I clearly hadn't had my morning coffee yet!)


Does DDG attempt to detect sites that serve malware though? Also, s/browser/search engine/.


As far as I know, DDG does just browser/search. By extension they shouldn't be able to do as much harm as a company that provides many more services (e.g., safebrowsing/diagnostic). [edited the following 2 sentences for readability.] For example, one area possible threat could be code.google.com. DDG doesn't have a counterpart of hosted code, so it can't possibly be a threat.

Hence, my confusion. Yes, part of me was just trying to be humorously sarcastic. But part of me really enjoys some of the innovative ways that Google leverages data. And yet, another part of my thinks that they have stepped over important privacy and security lines in other areas. Hence, I prefer DDG for the vast majority of my search needs.


Now I'm sure you have the term browser confused with search engine. http://en.wikipedia.org/wiki/Web_browser


So now all my domains have a "permanent record". I am having grade school flashbacks.


If by "permanent" you mean "90 days", then yes.


The comparison with Bing is interesting:

http://www.google.com/safebrowsing/diagnostic?site=bing.com

"This site was hosted on 25 networks" vs. 3, 1 virus vs. 503.


Also 83476 pages tested vs 6893497.


Is it the ads? Most malware is distributed through ads. I don't think there's a risk in AdWords text based ads, but the display ads frequently include malicious software. That and download.com. I'm glad I use the ad blocker.


I wonder why your comment is downvoted. It seems there is some bot on this site randomly downvoting comments.



2 of 139816 is 0,000014305%


isn't it 0.00143045%?


Odd that it provides links to the sites that hosted malicious content. That seems like a poor design decision.


Actually they link to the Diagnostic page for the malicious site.


This is hilarious


This comment has been downvoted by the downvoting bot.


There may be a downvoting bot, but if the culture here is anything like reddit was 5-7 years ago 'haha' type comments are probably frowned on.



I'm OCD and a slight perfectionist, so when I use this on my site I spend hours making sure I have not a single issue. After seeing that! I would go insane.

Makes you wonder how come M$ doesn't make their site more compatible? As a multi-billion dollar company, they should have higher standards and meet the W3 standards.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: