That is effectively the case. See for example DDIO with Intel which can only be ...

lukego · on Oct 24, 2016

I will respectfully disagree :).

We have taken Intel's reference code (https://github.com/lukego/intel-ipsec/blob/master/code/avx2/...) for high-speed AES-GCM encryption and used DynASM (https://luajit.org/dynasm.html) to refactor it as a much smaller program (https://github.com/snabbco/snabb/blob/master/src/lib/ipsec/a...). I see this as highly worthwhile: we are working on making the software ecosystem simpler and tighter just because we are hackers, while Intel are working primarily on selling CPUs and whatever is best for their bottom line.

I disagree with this characterization of DDIO but I don't think Hacker News comments is the best venue for such low-level discussions. Hope to chat with you about it in some more suitable forum some time :) that would be fun.

JoachimSchipper · on Oct 25, 2016

FWIW, I would be quite interested in your view on DDIO - anything you can link?

lukego · on Oct 25, 2016

Intel DDIO FAQ: http://www.intel.com/content/dam/www/public/us/en/documents/...

My understanding is that DDIO is an internal feature of the processor and works transparently with all PCI devices. Basically Intel extended the processor "uncore" to serve PCIe DMA requests via the L3 cache rather than directly to memory.

drewg123 · on Oct 24, 2016

I think you're confusing DDIO with DCA. DDIO is Intel's mechanism of allocating L3 cache ways to DMA, and works for any vendor's card. DCA is an older set of steering hints that cause per-TLP steering hint flags to influence whether or not a DMA write ends up in the CPU cache. DCA is highly targeted, and much more effective in realistic workloads because you can be smart, and cache just descriptors and packet header DMA writes (eg, metadata). With DDIO, you end up caching everything, and with a limited number of cache ways, you end up often caching nothing, because later DMAs push earlier ones out of cache before the host can use the data.

At a previous employer, we figured out the DCA steering hits and implemented it in our NIC. Thankfully enough of our PCIe implementation was programmable to allow us to do this.