Actually it does, and is quite measurable via standard tools like netperf. I have done this, but only because I've worked in Finance as a Linux monkey for the previous 9.5-ish years. I specialize in Linux tuning and reading far too much Linux kernel source code. That and big distributed systems.
In fact, modern kernels support this quite extensively and the RHEL7 documentation[1] has great tips on it for someone new to it.
You can tune an interface for max throughput via coalescing[2] on Linux via ethtool -C to change and ethtool -c to view current coalescing. Setting interrupt per packet helps with latency for certain workload types in addition to SO_BUSY_POLL[3] and the global or per-interface busy polling sysctl. However, interrupt per packet can trivially overwhelm CPUs if you don't isolate those cpus using things like isolcpus= on the linux grub commandline or cpu_exclusive=1 in a cpuset. RFS/RSS and NICs with multiple receive queues make this much easier to tune.
It is guaranteed to use more power, but you can do idle=poll[4] on the Linux kernel boot command line to stay in the highest C state and help with network latency.
You don't really always need an interrupt either as apps can DMA directly to and from the NIC <---> Application bypassing the CPU. How do you think RDMA works at a lower level? Finally, Linux "kernel bypass" networking aka 100% userspace tcp/ip stacks can be written to be truly interrupt-less take a look at Mellanox's VMA or Solar Flare's openonload. Heck, the default openonload config is interrupt-less. The more you know :)
It wouldn't matter because all your interrupts would be handled by a single core, becoming a bottleneck.
It also doesn't matter because heavy i/o will take up both system and user cpu, and the 90/10+ split (is the app even multi core??) puts you into full utilization, which is fine for bulk jobs, terrible for high performance requests. Even a single machine at 100% can (in unfortunate circumstances) cause domino effects. Better to build in excess capacity as a buffer for unexpected spikes, which means managing your clusters to not stack jobs which could compete for resources, but also not stack jobs that could unintentionally starve other jobs - this requires intelligent load balancing that's application and job specific. Or a cluster dedicated to specific jobs (which they have, ironically)
And you can go one step further using receive flow steering[1] or transmit flow steering. Most modern performance oriented network cards (Intel 10G, Solarflare 10G, anything from Mellanox, Chelsio, etc) surface each of these receive queues differently and can be seen as different on the right hand column in /proc/interrupts. You can distribute said rx/tx queues on different cores (ideally) on the same socket (but potentially a different core) as the application for minimum latency.
Linux has some really impressive knobs[2] for optimizing these sorts of weird workloads.