I built an HTTP service that listens on all 65535 TCP ports and tells you which port you connected to (very useful to diagnose which outbound ports are firewalled by ISPs or by Wifi networks):
The folks at Cloudflare have done it with an iptables TPROXY rule (which requires the socket to have the IP_TRANSPARENT option) which is how I did it too. But there is another way to do this in Linux: you can use an iptables REDIRECT rule, and the userspace program can obtain the original destination port by doing a getsockopt() call to read SO_ORIGINAL_DST.
Edit: oh I see now the blog post does mention the REDIRECT & SO_ORIGINAL_DST option, but criticize its performance... which makes sense given its dependence on conntrack.
There is a typo in Cloudflare's blog post: s/SO_TRANSPARENT/IP_TRANSPARENT/
Indeed! Conceptually REDIRECT is _very_ similar to TPROXY. The subtle difference is that REDIRECT seems to rewrite the destination-host-and-port while TPROXY keeps it the same, only does the routing earlier.
In practical terms to recover original target in REDIRECT you have to use the obscure SO_ORIGINAL_DST, while for TPROXY getpeername() will just work.
By this token TPROXY is a bit easier to use. This is for TCP. UDP is a bit harder.
neat. but why do you need a service at all to detect blocking? you can use timing to also do it easily without the need for any server component at all.
perhaps this works poorly for firewalls near to the service but you declared the problem to be one close to the client. AIUI
When an ISP blocks certain ports by dropping the SYN packet, the client sees a time out. There is nothing to "time" that can prove it's the ISP dropping it.
yes there is. When you don't get a RST back at the expected time (say *2), you know SYN was dropped. Are you arguing that it could be packet loss? You address that by taking multiple samples, and by comparing against loss to ports that you get ACK back from.
But you don't know who dropped it: the ISP or the remote server. In order to show it's the network between the client and server dropping it, you need a server that behaves in a known way, hence open.zorinaq.com I used to work in the InfoSec industry, running port scans from various locations, and open.zorinaq.com was incredibly useful to ensure there was no random firewall preventing us from finding certain open ports. That was the primary motivation why I built the service.
Author here. TPROXY module is pretty special, it really would have been hard to handle any inbound port without it. I guess it shows that there are benefits in keeping firewall and network stack code tied close.
Excellent article all the way around. I may assign this to my networking students further into the quarter. It provides a nice alternative use case that should help them think more critically about the relationship between applications, the socket API, and the rest of the network stack. Thanks!
This is great, thanks for sharing. I'm curious about the downstream proxy process (i.e. ::1234) and how you scale it and balance load across multiple instances of the process. You can't really use iptables to load balance your processes as either the DNAT or REDIRECT mechanism will modify the destination address, right?
Ex.
# TPROXY directs all traffic to :1234, and these rules load balance to 4 different processes
iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m statistic --mode nth --every 4 --packet 0 -j DNAT --to-destination 127.0.0.1:8080
iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m statistic --mode nth --every 4 --packet 1 -j DNAT --to-destination 127.0.0.1:8081
iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m statistic --mode nth --every 4 --packet 2 -j DNAT --to-destination 127.0.0.1:8082
iptables -t nat -I OUTPUT -p tcp -o lo --dport 1234 -m state --state NEW -m statistic --mode nth --every 4 --packet 3 -j DNAT --to-destination 127.0.0.1:8083
I have a vpn for myself and some friends that accepts connections on all ports. I set it up over a decade ago on OpenBSD with a simple pf redirect. I have never had any problems with it, but it obviously doesn't see nearly as much traffic as Cloudflare.
Does OpenBSD handle this differently than Linux, or am I doing this wrong?
The trick is that they want the application to be able to see what the original destination IP and port were. I'm not sure if a pf redirect preserves that information.
Not sure the headline is accurate: surely these kernel mechanisms were invented specifically _to_ allow this functionality? Therefore there is no abuse. More like “we found a mostly-forgotten netfilter feature designed to do the thing we’re trying to do, so we used it”.
Not exactly. TPROXY is designed for transparent proxying, but by way of the mechanism it works, can also be used to approximate binding to all TCP ports. The latter use case is a bit different.
TPROXY is totally amazing. We used it to modify nginx to create a transparent SMTP proxy that scales. Using TPROXY, we can pretend to be millions of ISP subscriber IPs at once in a single process.
In the near future, I'll need to do something likely very similar to what you did (albeit, probably on a smaller scale). Are there any technical details about this that you can share or perhaps just some pointers to relevant and/or helpful documentation?
(N.B.: I won't even be starting on this for probably a month or two so I haven't even begun to look into it. If there is documentation easily/readibly available via a Google search (i.e., I'll find 'em as soon as I Google for 'em) then just ignore my request. Thanks!)
What happens to this once NFTables takes over? I'm still using iptables in production, but I'm wary since my understanding is that iptables is sort of deprecated in favor of NFTables
Right, TPROXY is an iptables module (which implies that without someone to port it (assuming porting is even possible due to architectural differences), it isn't going to work on NFTables).
To clarify my original question, what will cloudflare do if/when iptables finally goes away? Has thought been put into it? Will they implement their own type of TPROXY? Will they continue to support iptables themselves? There's quite a few paths, and I'm interested in which one they deem most optimal because I respect their opinions a lot.
looking at the nftables code, I think the only reason nftables doesn't support TPROXY is that no one wrote some of the config parsing / seralization stuff.
Sounds like cloudflare might want to start trying to submit some nftables TPROXY support now, so it's there in the vanilla kernel when they end up needing it. :)
It'd expect someone to eventually submit such a patch, though I don't know how urgent this issue is. Iptables isn't going anywhere anytime soon, so Cloudflare can continue to use this method on the edge nodes.
What's the problem with SO_ORIGINAL_DST? Could you please explain a bit why the code is not encouraging? The author of TPROXY also mentioned somewhere else that SO_ORIGINAL_DST is racy, but I'm not a kernel developer and don't understand why. Thanks!
Digging deeper, I found more explanation on StackOverflow https://stackoverflow.com/a/5814636/184061 (seems to be written by tproxy author Balazs Scheidler judging by the username).
http://open.zorinaq.com/
The folks at Cloudflare have done it with an iptables TPROXY rule (which requires the socket to have the IP_TRANSPARENT option) which is how I did it too. But there is another way to do this in Linux: you can use an iptables REDIRECT rule, and the userspace program can obtain the original destination port by doing a getsockopt() call to read SO_ORIGINAL_DST.
Edit: oh I see now the blog post does mention the REDIRECT & SO_ORIGINAL_DST option, but criticize its performance... which makes sense given its dependence on conntrack.
There is a typo in Cloudflare's blog post: s/SO_TRANSPARENT/IP_TRANSPARENT/