Hacker News new | past | comments | ask | show | jobs | submit login
Filter all ICMP and watch the world burn (rachelbythebay.com)
292 points by jonchang on May 16, 2015 | hide | past | favorite | 57 comments



For those who are new to this issue: ICMP is the sort of 'signalling' protocol related to IP itself. ICMP provides a few services (eg. classic ping), but critically when issues within the IP layer occur elsewhere on the internet while attempting the delivery of a packet, ICMP messages are usually sent in response. However, recently (~last 15-20 years) badly configured firewalls block all ICMP or certain types of ICMP, which can result in difficulties in communication. In this case, type 3 code 4 was blocked. More info @ http://en.wikipedia.org/wiki/Internet_Control_Message_Protoc...


Thanks. Why does the source host sets the Don't Fragment flag?


It is nearly universally the case that TCP packets have the DF flag set. This is because it's most efficient to have TCP segments sized to fit into a single packet each - if part of a segment doesn't arrive, the entire segment will be resent anyway.

So the DF flag is set, and the sending TCP reduces its estimate of the path-MTU whever it sees an ICMP "Fragmentation Needed" response.


According to the original IPv4 RFC @ http://tools.ietf.org/html/rfc791 page #25...

If the Don't Fragment flag (DF) bit is set, then internet fragmentation of this datagram is NOT permitted, although it may be discarded. This can be used to prohibit fragmentation in cases where the receiving host does not have sufficient resources to reassemble internet fragments.

One example of use of the Don't Fragment feature is to down line load a small host. A small host could have a boot strap program that accepts a datagram stores it in memory and then executes it.

What it seems to mean is things like PXE[1] or BOOTP[2].

Basically, DF was built to allow the sender to optionally force zero fragmentation by intermediate hosts en-route that are connected to networks with a smaller MTU than the packet size originally emitted by the sender. This was originally intended to be of use because the sender was somehow made aware of limitations in the recipient's network stack.

Probably 20 or more years ago it was used for awhile as a latency-related hack for certain applications (VOIP, video, low-latency finance, certain scientific experiments generating vast amounts of data, etc.) mostly on UDP[3], though we have better methods for those now that operate through other IP headers (QoS).

Theoretically, intermediate nodes could also use it as part of path selection during routing, though I have no idea if this has been done or is encouraged - eg. a packet from node A reaches node B en-route to node C. Node B has two routes to node C. The lower-cost route is available with a smaller MTU than the packet size, and a higher-cost route is available with a large enough MTU to accommodate the packet size. The DF flag could be used by routing logic at node B to automatically shuffle the packet across that higher-cost route.

I believe the Path MTU discovery[4] feature of modern Linux kernels also probably uses this mechanism combined with short TTLs (ie. maximum hop counts, which also cause ICMP errors to be returned on their failure, and are the basis of traceroute[5], itself an unintended hack based on unrecognized capabilities of the IP protocol's specification) to optimize long-lived traffic flows.

There must be other edge-case or optimization-related wishy-washy reasons to set it, too. The main thing to remember is: none of these were really intended by the authors of IPv4. This whole historic pile of what-if edge-case hackyness has been thrown out in favor of a better system in IPv6[6].

[1] http://en.wikipedia.org/wiki/Preboot_Execution_Environment

[2] http://en.wikipedia.org/wiki/Bootstrap_Protocol

[3] http://en.wikipedia.org/wiki/User_Datagram_Protocol

[4] http://en.wikipedia.org/wiki/Path_MTU_Discovery

[5] http://en.wikipedia.org/wiki/Traceroute

[6] http://en.wikipedia.org/wiki/IPv6#Simplified_processing_by_r...


The reason is because you never want routers to have to fragment your packets ever. Fragmentation is really inefficient. So any modern stack will always set DF and listen for ICMP unreachables. In other words, PMTU discovery.

None of the other reasons you mention are really relevant.


So, the sender uses DonfFragment as a tool to detect the optimal packet size.

Alternative idea: when router needs to fragment a packet, router passes the fragmented packet through, but sends an ICMP to the source - "fragmentation happened". (Or the party which re-assembles the fragmented packet, maybe receiver, sends this "fragmentation happened" message).

So this IP packet does not need to be resent, the source can optimize packet size for future, and the network doesn't break if ICMP is disabled somewhere.

We don't want to constantly send these "fragmentation happened" ICMP messages (if they don't reach the source and it keeps sending large packets), so the router sends ICMP not always, but only for first 3 fragmented packets of that source in each 10 minutes.

[I am just thinking, it's not a real proposal.]


What is strange for me - the guy in the article uses web browser, ssh and other general programs. Why whould his network stack set the DF flag.


Maybe his kernel or an intermediate router has path MTU discovery enabled?


I found the answer - he uses IPv6, where routers never fragment, they just drop the packet and send ICMP response.


Unfortunately almost every time I've dealt with PCI-DSS compliance auditors they almost always raise the fact that I haven't got ICMP completely filtered.

It's always an annoying, long argument with them about why ICMP exists, why it shouldn't be completely filtered, and what the potential side effects of filtering it are.


Not PCI-DSS, but I dealt with security auditors for a client that insisted that we shouldn't allow ping... To the address of the public website... Because we might reveal there was something there.

My passive aggressive response was to point out, while copying the client, that a number of more prominent security auditors sites responded to ping, as well as the websites of any number of intelligence organizations, banks and similar.


Just had an audit with one of the biggest telcos, done by a large "security" firm. They insisted that the public website not respond unless the right Host header was there. Stupid, but OK, I can see it on a checklist for intranet apps.

But the real kicker: The site was TLS only, so connecting to the IP will still leak the hostname, from the cert.

Edit: This was a really big security firm, too. Totally worthless audit. They actually complained that a site admin could "include iframes in the HTML, which could be a malware vector" when uploading content. Ignoring that they could also upload scripts and arbitrary binaries.

After their weeklong "penetration test" concluded, I found some serious XSS (public user->admin, which could easily turn into system takeover) with about 5 minutes of looking. Are most audits this useless?


After their weeklong "penetration test" concluded, I found some serious XSS (public user->admin, which could easily turn into system takeover) with about 5 minutes of looking. Are most audits this useless?

Hiya! I started at Matasano/NCC back in February. Part of the reason I joined was to find out whether or not everything tptacek has been saying for years is true. Turns out it's pretty much all true. Some of that is awesome, like the hiring process. Some of that is scary, like the fact that someone of moderate skill level can usually break into most production apps.

My experience is limited. That said, put me on an audit and the first thing I'll check for is XSS and SQLi. The second thing I'll check for is authz: log into an admin account, note a URL to perform an admin action, log in as a normal user, try to access that URL. Third thing I'll check for is if there are any upload forms, because that's a common way to get RCE: upload a file and try to trick the app into executing it. Etc. Stuff that matters.

It's a point of pride to ensure that our assigned app has been pentested thoroughly by the conclusion of an audit. A thorough pentest doesn't necessarily mean finding every possible vulnerability, because time is often limited, but it does mean finding the serious ones.

If we include any findings in the final report which could be called "trivial" (there are occasionally some), they're marked as informational findings, i.e. their severity level is less than low. The reason we include them is because even though the finding doesn't necessarily pose any security risk, a client will often get another pentest from another security firm and diff the results. If the other firm points out something we thought was too trivial to include, the client will rightfully ask why we didn't find it. (We try to be pretty clear in the report about each finding, though, so you're not going to come away with the impression that we're saying you need to address something trivial.)

I don't know enough about your experience with that security firm you worked with to comment directly, but communication is one of the most important aspects of the job. If we find some flaws but don't communicate well to the client, then nobody was served by the audit. So if you're feeling like the whole process was a waste of time, you might want to shop around. There are several good security firms, not just Matasano/NCC, so you may want to give it another shot.

For what it's worth, the fact that you had a bad experience with one of the firms is actually painful to me. It's only recently that people started to care about security in a significant way, and it's a tenuous position. The more people who get a "security audit" and end up feeling like it was a waste of time, the more likely we are to end up back in a situation where people know there are probably serious security problems but feel like there's nothing they can do to find or fix them. There is: Give us a test environment and two weeks. We'll find what matters, and we'll give you a report explaining each issue and how to fix them.


Thank you for your comment! I want to add to it as a former pentester: it's absolutely painful to read about these nitpicky "the world is going to burn if you don't modify trivial security setting X that will destroy user experience". Not because these shouldn't be included in the report, but because the focus is wrong.

As strayptr, I would also include the trivial issues as "informationals" in the report as you do want your clients to know about these for a number of reasons. However, most of my time would go to hunting for severe issues, where I defined severe on some mental ranking based on "difficulty to exploit", potential impact,... In addition, these issues were also where most of my attention went to afterwards, because you need to explain and educate development, testing and business on the issues, why you think they are important and how to best/quickest fix them.

In my opinion, building up relationships and having empathy for your client is very important. I would always try to have a chat with development/test/business to get a feel of where their heads were at. That would help me both while testing (what is important to them? how did they develop it? what is their maturity?) and while reporting issues (they would actually believe me, I could help them rank the issues and they would allow me to brainstorm how to best mitigate the issues for their environment).


There's more cargo cultism in security than pretty much any other field of CS/IT that I am aware of. "Best practices" are cargo culted with zero understanding as to why they exist.

"If firewall equals secure, then MOAR firewall equals more secure!"

Blocking ICMP, of course, makes the firewall more firewally.

I try to explain that (a) firewalls are practically useless alone since most threat vectors today are "pulled" not "pushed," and (b) things need to actually work... but nope.

Unfortunately in security black hats tend to be more skilled than (most) white hats. Really good security people are some of the most knowledgeable people I've met, but there don't seem to be many of them. They also tend to be terrible salespeople (like most techies), so the bad checkbox-ticking "security firms" are the ones who get enterprise contracts.


Most annoying instance I've experienced with PCI compliance was with an auditor who didn't understand the concept of patching. Even after providing the full list of backported CVEs, which clearly showed the one he insisted was missing on the system, he still refused to pass it. Only after escalating the issue was it finally marked as passed.


I've had that too - the only thing the auditor in question cared about was the version number that their run of nmap had guessed.


We got hit massively by this at CloudFlare (though we were not explicitly filtering ICMP, but changes we made to our infra meant that PMTU packets got lost) We wrote a blog post about this too: https://blog.cloudflare.com/path-mtu-discovery-in-practice/ and the solution to our change: https://github.com/cloudflare/pmtud


Interesting!

I'd think, in theory the ECMP router could keep track of the MTU on a per IP basis (rather than per TCP connection) based on it having received the ICM unreachable packet. And from that moment on, sending a spoofed ICMP packet back whenever one of the servers it's routing for sends a packet the router knows won't reach the host.

But even if that works, I'm by no means a network engineer, your solution of simply broadcasting the packets is probably more efficient in the real world.


icmp in ipv6 does much more than its ipv4 counterpart and most importantly:

1) L2 address resolution (neighbor discovery), which ARP used to do in ipv4,

2) full network autoconfiguration (global scope addresses, default route(s), DNS resolver), which DHCP used to do in ipv4 (although DHCPv6 is still an option),

3) multicast group management (MLD), which igmp used to do in ipv4,

4) path mtu discovery (through 'packet too big' messages this article references). Routers fragment packets exceeding the link MTU in ipv4, they notify the source of the lower mtu in ipv6.

ping, TTL exceeded, destination (host, route or port) unreachable and parameter problem were mostly carried over from ipv4.

Blocking 1, 2 (and to some extent 3) on a local network will most likely break ipv6 connectivity entirely while blocking the others will only break it in subtle, hard to debug ways (especially with ECMP and traffic engineering where multiple routes for a given destination can be used).

I've found that explaining this before asking network admins to unblock icmpv6 filters is a good way to succeed (although it can be hard, i'll give you that).

People aren't used to filter ARP or link local broadcast in ipv4 (which DHCP uses), so telling them that they need to allow icmpv6 to let stations merely configure themselves is a bit of a mentality change.

At the same time, developers of firewall management tools like ufw understood this problem a while ago and insert a working, good, tried and tested icmpv6 accept list as first rule which you can't mess with.

Telling people to use ufw is usually much better than teaching them ip[6]tables.


We use a load balancer product that is Linux-based. It defaulted to blocking all IPv6 ICMP, including neighbor discovery. This made IPv6 not work at all. It was a struggle to get them to fix it. And I don't think they've released the update with the fix yet either.


Last time I checked, EC2 defaulted to filtering ICMP packets, with the predictable bad results:

http://www.daemonology.net/blog/2012-11-28-broken-EC2-firewa...


As a bonus, you can't enable them for things like ELBs where you don't control the box.


You can in VPC just fine. The ELB has a regular security group.


When people filter ICMP is really really annoys me. Sure I get that some people don't like to respond to pings, or that you can ddos some routers by flooding them with 64 byte packets (old routers btw) but hey ICMP is a critical part of making the network work correctly.


Discovering Path MTU black holes on the Internet using RIPE Atlas: https://www.os3.nl/_media/2011-2012/courses/rp2/p57_report.p...

A Master thesis research report from 2012 which examined this very problem on a global scale, using the RIPE Atlas monitoring network.


Eh, seen this so many times...

My favorite related problem was about 12 years ago when I had a Mac and a Linux box side-by-side and the Mac could connect to a Verizon site (a paging gateway) while the Linux box never even got a response to its SYN. I eventually figured out the Linux box had ECN enabled. Probably an out-of-date firewall at Verizon's end didn't like such exotic TCP options. Disabling ECN on the Linux box fixed the issue.

(I believe I was working on an email-to-page script at the time.)


ECN was a bit of a special case, and a major pain in the ass: it uses a previously-unused bit in the header and "security conscious" network hardware developers forgot the "liberal in what you accept", set-it-to-zero-when-you-create-a-packet-and-ignore-it-otherwise proper default behavior for unused bits.


Yup, I've had this: "ICMP is a security hole" lets turn it off.

such a tedious conversation to have with the networks(!) team


TCP and UDP are the real culprits. As a network admin with unquestionable authority I always advocate for banning them on my networks.


Well, after all some 99.9% of successful hacking attacks use those protocols. They're definitively very dangerous.


Dude, I bet those protocols were invented by humans. Humans mostly do destructive stuff. They're definitely very dangerous.


Can you explain how ICMP is insecure? Is it just ping being exploited?


ICMP can be used to affect routing on insecurely configured hosts. That might be an exploit vector? Circa 1995-97 there was the ping of death....but that was 20 years ago.

Most of these security best practices are like building code. They get written down and are never updated. For example, showers need a 2" drain pipe but tubs only need 1.5". The theory was that a backed up drain would flood a bathroom with a lower shower rim very quickly. It would take much longer with a tub (higher side wall). No one ever bothered to update the regulations once we stopped allowing 5 gpm showerheads. There's no reason put into it. Just "nope it says 2" required b/c we've always required 2"." "Why?" "Because that's how it is."

Same way with computer security.


"Code is not prescription."


I'm not sure of the specific reasons in this case. I think it was to do with possible internal host discovery. (I'm not sure how that would work in practice. Yes you can use traceroute, but I'm not sure how that would work behind nat, without forcing a client to open up a port first)

Possibly its to do with worry about smurf attacks, but then most firewalls > 2004 should deal with that gracefully.(old company though...)

Have a look here, its an easy read and has a nice list:

http://resources.infosecinstitute.com/icmp-attacks/


[flagged]


I see. But why is this a security threat?


When a client tries to send packets too big for the network (as when the client is configured with jumbo frames but the network isn't), this can be really painful to debug. The worst is that many things will work because small packets get through. For example, an "scp" connection may successfully connect, and then just hang when it starts transferring real data.


This is a well known problem. Windows (since at least 2000) can detect this scenario and mitigate it [1].

[1] https://technet.microsoft.com/en-us/library/cc960465.aspx


I'd probably be saying "Eureka!" if this were 1997. But it's 2015. This is super basic stuff. Do people these days just blindly assume a MSS of 1460 is going to work on the Internet? Or do they think that the Internet is comprised entirely of Ethernet links?

Or has the use of cloud providers and reliance on higher level programming languages produced a generation of ops people who don't understand the mechanics of how things work?


Everyone who downvoted my comment needs to pick up and read a copy of Comer's "Internetworking with TCP/IP". That and Stevens' TCP/IP Illustrated are the best sources for networking out there. You won't find the info on any blog. It won't be in something on ServerFault or StackOverflow and definitely not HN. Buy, read, learn.

Yes, I know the US publisher has messed with the pricing for current versions but you can find the previous ones used pretty cheap. Other than dropping the IPng chapter, I doubt much has changed.


The lowest MTU on IPv6 is 1280, which means that even if we go with the minimal MTU just to get traffic to flow, it's not as terrible as IPv4's minimum MTU: 576.


That just means you'll have to encapsulate ipv6 in ipv4 to get across those really poorly configured routers...


Would those routers support 6 in the first place?


If it's encapsulated it doesn't matter :)


Brings back memories from my past... http://znep.com/~marcs/mtu/

Woefully out of date and wrong, but helped some folks out.


We have what I think is a long running MTU problem on Rackspace hosting for a customer. We are losing parts of HTTP requests between the H5 load balancer and the customer's web servers (which are running IIS).

The header of the request reaches IIS and then the content body of the request fails to turn up causing a 500 error on the server.

Issue is mostly seen on POST requests where the content of the request is going to be split over more packets. It's been driving us nuts.

I wonder if we should also check the ICMP blocking too?


Hey! Ex-Racker (NetSec) here. The truth is on the wire, get captures and check where the packets are dropped.

Do you have the F5 in front of the webservers or on a different interface in one-arm mode? Second case you're going through the firewall, so I guess that's your scenario.


> ipv6

> someone blackholed the very important packets which say "fragmentation needed but DF set"

IIRC IPv6 never fragments and uses MTU path discovery (via ICMPv6)


IPv6 does have fragments. Routers will not do the fragmentation. They just drop the packet and force the client to do it. It can be horrible if you're using a tunnel broker and don't lower your MTU.


Heh, back when google was just a white page with a search bar I noticed that the front page would come up, but he results didn't. Turns out I was on a home network connection with PPPoE which slightly decreases the maximum MTU.

I opened a ticket with google and a SRE called me back (to my surprise) and we tracked it down. Google had a new firewall that was blocking MTU negotiation.


Educational Stack Overflow from the Google results for 'mtu':

http://serverfault.com/questions/43866/whats-the-best-mtu-se...


Damn, I had this issue a while ago but I was not able to figure out the cause. I posted to stackoverflow but nobody pointed me in this direction. now I went back and answered my own question :) thanks! @jonchang


Some application protocols require ICMP and another TCP or UDP port and won't send their TCP or UDP packets until the ICMP ping packet has successfully been responded to.


Its really kind of depressing that this is 'news' enough that it gets 245+ upvotes here. It can't be older than the internet itself, obviously, but its damn close...


Read the title and immediately knew it would be PMTU.


Ditto. Yet I still have this conversation with self styled "network architects" who want to blanket block all ICMP. It's depressing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: