My guess would be that the server assumes that every call to recv() terminates on a message boundary.
With TCP_NODELAY and small messages, this works out fine. Every message is contained in a single packet, and the userspace buffer being read into is large enough to contain it. As such, whenever the kernel has any data to give to userspace, it has an integer number of messages to give. Nothing requires the kernel to respect that, but it will not go out of its way to break it.
In contrast, without TCP_NODELAY, messages get concatenated and then fragmented based on where packet boundaries occur. Now, the natural end point for a call to recv() is not the message boundary, but the packet boundary.
The server is supposed to see that it is in the middle of a message, and make another call to recv() to get the rest of it; but clearly it does not do that.
With TCP_NODELAY and small messages, this works out fine. Every message is contained in a single packet, and the userspace buffer being read into is large enough to contain it. As such, whenever the kernel has any data to give to userspace, it has an integer number of messages to give. Nothing requires the kernel to respect that, but it will not go out of its way to break it.
In contrast, without TCP_NODELAY, messages get concatenated and then fragmented based on where packet boundaries occur. Now, the natural end point for a call to recv() is not the message boundary, but the packet boundary.
The server is supposed to see that it is in the middle of a message, and make another call to recv() to get the rest of it; but clearly it does not do that.