Hacker News new | past | comments | ask | show | jobs | submit login

That is effectively the case. See for example DDIO with Intel which can only be enabled for specific devices with full cooperation between Intel and this particular vendor.

You cannot compete with a DDIO-enabled device, which of course all Intel devices are.

See also the Intel multibuffer crypto library, which was specialized and timed for Intel CPUs. No one else could write at this level of optimization, because we do not have the internal design and simulator that Intel work with.

So yeah, you are talking with sophisticated hardware which will have firmware blobs and undocumented features. If you only rely on general instructions sets you will only get so far. When we are talking about ns of latency and these level of bandwidth, they make the difference between several stacks.

The push for smart-NICs will increasingly blur the line between soft and hard layer. We can either direct our efforts so as to avoid rewriting an abstraction layer upon it or do so for each vendor-specific API (OFED is but one example, there will be others).




I will respectfully disagree :).

We have taken Intel's reference code (https://github.com/lukego/intel-ipsec/blob/master/code/avx2/...) for high-speed AES-GCM encryption and used DynASM (https://luajit.org/dynasm.html) to refactor it as a much smaller program (https://github.com/snabbco/snabb/blob/master/src/lib/ipsec/a...). I see this as highly worthwhile: we are working on making the software ecosystem simpler and tighter just because we are hackers, while Intel are working primarily on selling CPUs and whatever is best for their bottom line.

I disagree with this characterization of DDIO but I don't think Hacker News comments is the best venue for such low-level discussions. Hope to chat with you about it in some more suitable forum some time :) that would be fun.


FWIW, I would be quite interested in your view on DDIO - anything you can link?


Intel DDIO FAQ: http://www.intel.com/content/dam/www/public/us/en/documents/...

My understanding is that DDIO is an internal feature of the processor and works transparently with all PCI devices. Basically Intel extended the processor "uncore" to serve PCIe DMA requests via the L3 cache rather than directly to memory.


I think you're confusing DDIO with DCA. DDIO is Intel's mechanism of allocating L3 cache ways to DMA, and works for any vendor's card. DCA is an older set of steering hints that cause per-TLP steering hint flags to influence whether or not a DMA write ends up in the CPU cache. DCA is highly targeted, and much more effective in realistic workloads because you can be smart, and cache just descriptors and packet header DMA writes (eg, metadata). With DDIO, you end up caching everything, and with a limited number of cache ways, you end up often caching nothing, because later DMAs push earlier ones out of cache before the host can use the data.

At a previous employer, we figured out the DCA steering hits and implemented it in our NIC. Thankfully enough of our PCIe implementation was programmable to allow us to do this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: