What if we changed the kernel or tcp stack to hold on to the packet for only a short time before sending it out. This could allow you to balance the latency against the network cost of many small packets. The tcp stack could even do it dynamically if needed.