Hacker News new | past | comments | ask | show | jobs | submit login
When TCP sockets refuse to die (popcount.org)
243 points by ingve on Sept 23, 2019 | hide | past | favorite | 25 comments



Then there is a discussion about forcefully killing sockets :)

* close(): socket will be lingering in background as usual

* shutdown(SHUT_RD): no network side effect, discards read buffer

* shutdown(SHUT_WR): equivalent to FIN SO_LINGER socket - if timeout non-zero blocks until write buffer flushed; if timeout is zero then immediately sends RST

* the trick with close() after TCP_REPAIR: (https://lwn.net/Articles/495304/) immediately discard a socket with no network side effects.

* "ss --kill" command: forcefully close a socket from outside process, done with netlink SOCK_DESTROY command.


> * shutdown(SHUT_RD): no network side effect, discards read buffer

My understanding is that if the read buffer is not empty, or if you later receive any further data from the other end, that this will result in a RST.

Wrt to linger behaviour: "if timeout non-zero blocks until write buffer flushed" is only true of blocking sockets. For non-blocking sockets things get complicated and vary across platforms


I made an attempt to check:

* shutdown(SHUT_RD): seem not to have _any_ side effects. you can totally still recv() on that socket. Kerrisk writes 61.6.6: "However if the peer application subsequently writes data on its socket, then it is still possible to read that data on the local socket". Basically, SHUT_RD makes recv() return 0. That's all it does.

* SO_LINGER on O_NONBLOCK: shutdown() doesn't block. close() still blocks.


This highlights few more details on SHUT_RD: https://books.google.com/books?id=ptSC4LpwGA0C&pg=PA173&lpg=...

That is not discussed in POSIX at all, I believe, so basically platform-unaware SHUT_RD is vaguely defined and I wouldn’t even rely on recv() returning zero in particular.

Edit: changed books domain to .com


Ok so I decided to check it empirically. Behavior is indeed platform-dependent.

Linux: after shutdown(SHUT_RD) all blocked recv() calls unblock and return 0. But the other side can still send data and the recv() call will still read it! It is just that after shutdown when there is nothing to read a recv() call immediately returns 0 instead of blocking.

macOS (and BSD, I presume?): The read buffer is discarded and all subsequent recv() calls return 0. If the other side sends data it is discarded.

Unfortunately I have no Windows machine around to try out.

Now, maybe someone can clarify, given such wildly different behavior what is the intended use case for shutdown(SHUT_RD)?


It is likely a remnant of non-PF_INET families under SOCK_STREAM. There is no SHUT_RD in TCP by design.

Sockets are known to be not very standard landscape historically. Best bet here is to just stick with that “Disables further receive operations” posix definition and follow it to the letter by not recv’ing anything anymore.


Thank you very much for the SOCK_DESTROY. I have completely missed its addition to the kernel.


Also, just to clarify: Shutdown does not release the socket; the fd is still valid and you need to call close on it to eventually release all resources related to the socket.


Side conversation.

I am constantly amazed how a tiny piece of code, the linux (or bsd) TCP stack, can be a source of mysteries and adventures for decades, even for kernel experts and industry leaders like CF. The thing has around 11 states and 20 transitions, around 4000 LOC.

Compare this with some of the multi-million LOC, distributed monsters we all know and love.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...


This is an example of "the settings problem". When a system has even a moderate number of behaviors controlled by their respective settings the number of possible interactions rises exponentially. Combine that with different possible user behaviors and you get a combinatorial explosion of possibilities with some weird results that are non-obvious even to experts.


That's a good point, plus it's not just (states x transitions), it's also a whole bunch of hidden state: the other guy's state as well as all the packets in flight.


Isn't that just TCP (rather than TCP stack).


Yes, sure, but the protocol and one implementation of it are pretty tightly coupled. As a measure of algorithmic complexity (ala Chaitin?), the kernel code should show us at least order of magnitude, 4 K lines of code.

Compared to, say Google which is 2 billion, a tad more complex.


If you are going to force a minimum drain rate, please make sure you use a large enough monitoring period. With the patch in "The curious case of slow downloads", once 60 seconds have passed it starts checking download speed as often as every second, which is really aggressive. If you have a slow connection that's not super-stable, you're still going to get kicked, even if you're well over the minimum drain rate on average. An average of some kind over 15-20 seconds would be a lot more appropriate here.


Very true. Just today, I had to use an old 3G phone tethered to my laptop for data connection and found my ping times to be in orders of 5-10 seconds and at sporadic intervals in between data was getting sent/received in bursts at much lower latencies. It wasn't fun trying to get work done on such a connection.


What's really weird is that TCP is an endpoint protocol and should have been relatively easier to upgrade/replace/change (relative to say, IP protocol).

But why haven't we moved to something better?

Say, why doesn't Apple use a better suited protocol between Apple devices and Apple servers? Why doesn't google use a better protocol between Google devices and Google servers (oh wait, they do – QUIC..which is something-other-than-TCP over IP).

More people should be doing this, yes? why not?

It is as if the layered architecture of the network isn't being taken advantage of by engineers.

As people build 4G-5G networks that are IP based, shouldn't we insist they build purely IP based and not peek into layers above and make assumptions? thereby enable more of the flow control and reliable transmission protocol experimentations?


Because NAT. Most commonly used as port address translation (One-to-many NAT), so operates with TCP and UDP. That is not only home router issue, mobile netoworks use sometimes Carrier-grade NAT (NAT444). Any new IP protocol has problem with that, so nobody wants to implement something that is going to be broken for most customers.


Yeah, that's the reason for UDP wrapping the custom protocol.


1. You don't want to reimplement a quarter of the network stack in userspace.

2. Network infrastructure may drop "unusual" packets.


> 1. You don't want to reimplement a quarter of the network stack in userspace.

That post doesn't say anything about user space. Ideally the new protocol would be in the kernel, triggered with just a flag or even automatically.


It's not reimplementing if it is a new protocol with different capabilities. There's no point in having a layered architecture if we cannot evolve the layers, especially the layers designated as endpoint layers.


Thought this article was interesting as I've been working with a piece of hardware lately that doesn't close STMP connections after sending mail. Took me a while to figure that out, it would always send email properly the first time but then wouldn't be able to again until after a reboot. Turns out it doesn't close TCP sockets created while sending mail, unless you jump through some hoops. Such is the world of embedded industrial devices, unfortunately.


TLDR: TCP_USER_TIMEOUT is an important setting but it's somewhat tricky, not properly documented and there are kernel bugs related to it.


This is indeed very problematic, I worked on ONVIF test suite 3 years ago, in some failing test cases, the tcp socket can never die in time, it failed the certification as a whole as all the following unit test cases can not continue. All those immediately-kill-tcp-socket or socket-port-reuse can not help, at least not reliably.


System Of A Down intensifies




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: