Or turn interrupts completely off and poll. If you always have work in the queue...

user5994461 · on Nov 3, 2016

That makes no sensen. Polling means the core is running at 100% all the time.

ddorian43 · on Nov 3, 2016

Yes, but it's actually faster. Source: scylladb.

user5994461 · on Nov 3, 2016

Still doesn't make sense in the context of a network card. There must be an interrupt to pre-empt the system or there's gonna be packet loss.

diroussel · on Nov 9, 2016

No, you just need to service the work before the buffer overflows.

SEJeff · on Nov 12, 2016

Actually it does, and is quite measurable via standard tools like netperf. I have done this, but only because I've worked in Finance as a Linux monkey for the previous 9.5-ish years. I specialize in Linux tuning and reading far too much Linux kernel source code. That and big distributed systems.

In fact, modern kernels support this quite extensively and the RHEL7 documentation[1] has great tips on it for someone new to it.

You can tune an interface for max throughput via coalescing[2] on Linux via ethtool -C to change and ethtool -c to view current coalescing. Setting interrupt per packet helps with latency for certain workload types in addition to SO_BUSY_POLL[3] and the global or per-interface busy polling sysctl. However, interrupt per packet can trivially overwhelm CPUs if you don't isolate those cpus using things like isolcpus= on the linux grub commandline or cpu_exclusive=1 in a cpuset. RFS/RSS and NICs with multiple receive queues make this much easier to tune.

It is guaranteed to use more power, but you can do idle=poll[4] on the Linux kernel boot command line to stay in the highest C state and help with network latency.

You don't really always need an interrupt either as apps can DMA directly to and from the NIC <---> Application bypassing the CPU. How do you think RDMA works at a lower level? Finally, Linux "kernel bypass" networking aka 100% userspace tcp/ip stacks can be written to be truly interrupt-less take a look at Mellanox's VMA or Solar Flare's openonload. Heck, the default openonload config is interrupt-less. The more you know :)

[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterp...

[2] https://en.wikipedia.org/wiki/Interrupt_coalescing

[3] http://man7.org/linux/man-pages/man7/socket.7.html

[4] https://access.redhat.com/articles/65410

EDIT: This is a fantastic and simple overview with actual netperf results showing how this helps with latency: http://www.intel.com/content/dam/www/public/us/en/documents/...

srd · on Nov 3, 2016

Are you talking about the NIC interrupts or all hardware interrupts? If the later - how do you configure that?

0xbadcafebee · on Nov 3, 2016

It wouldn't matter because all your interrupts would be handled by a single core, becoming a bottleneck.

It also doesn't matter because heavy i/o will take up both system and user cpu, and the 90/10+ split (is the app even multi core??) puts you into full utilization, which is fine for bulk jobs, terrible for high performance requests. Even a single machine at 100% can (in unfortunate circumstances) cause domino effects. Better to build in excess capacity as a buffer for unexpected spikes, which means managing your clusters to not stack jobs which could compete for resources, but also not stack jobs that could unintentionally starve other jobs - this requires intelligent load balancing that's application and job specific. Or a cluster dedicated to specific jobs (which they have, ironically)

user5994461 · on Nov 3, 2016

You can assign different interrupts to different cores. That's another advanced optimization.

e.g. One core per network card.

SEJeff · on Nov 12, 2016

And you can go one step further using receive flow steering[1] or transmit flow steering. Most modern performance oriented network cards (Intel 10G, Solarflare 10G, anything from Mellanox, Chelsio, etc) surface each of these receive queues differently and can be seen as different on the right hand column in /proc/interrupts. You can distribute said rx/tx queues on different cores (ideally) on the same socket (but potentially a different core) as the application for minimum latency.

Linux has some really impressive knobs[2] for optimizing these sorts of weird workloads.

[1] https://lwn.net/Articles/382428/

[2] https://www.kernel.org/doc/Documentation/networking/scaling....