This must be the 10th blog post to land on HN on the same topic, and they all wa...

lukego · on Oct 24, 2016

I am a Snabb hacker and I see things differently. Ethernet I/O is fundamentally a simple problem, DPDK is taking the industry in the wrong direction, and application developers should fight back.

Ethernet I/O is simple at heart. You have an array of pointer+length packets that you want to send, an array of pointer+length buffers where you want to receive, and some configuration like "hash across these 10 rings" or "pick a ring based on VLAN-ID." This should not be more work than, say, a JSON parser. (However, if you aren't vigilant you could easily make it as complex as a C++ parser.)

DPDK has created a direct vector for hardware vendors to ship code into applications. Hardware vendors have specific interests: they want to differentiate themselves with complicated features, they want to get their product out the door quickly even if that means throwing bodies at a complicated implementation, and they want to optimize for the narrow cases that will look good on their marketing literature. They are happy for their complicated proprietary interfaces to propagate throughout the software ecosystem. They also focus their support on their big customers via account teams and aren't really bothered about independent developers or people on non-mainstream platforms.

Case in point: We want to run Snabb on Mellanox NICs. If we adopt the vendor ecosystem then we are buying into four (!) large software ecosystems: Linux kernel (mlx5 driver), Mellanox OFED (control plane), DPDK (data plane built on OFED+kernel), and Mellanox firmware tools (mostly non-open-source, strangely licensed, distributed as binaries that only work on a few distros). In practice it will be our problem to make sure these all play nice together and that will be a challenge e.g. in a container environment where we don't have control over which kernel is used. We also have to accept the engineering trade-offs that the vendor engineering team has made which in this case seems to include special optimizations to game benchmarks [1].

I say forget that for a joke.

Instead we have done a bunch more work up front to first successfully lobby the vendor to release their driver API [2] and then to write a stand-alone driver of our own [3] that does not depend on anything else (kernel, ofed, dpdk, etc). This is around 1 KLOC of Lua code when all is said and done.

I would love to hear from other people who want to join the ranks of self-sufficient application developers. Honestly our ConnectX driver has been a lot of work but it should be much easier for the next guy/gal to build on our experience. If you needed a JSON parser you would not look for a 100 KLOC implementation full of weird vendor extensions, so why do that for an ethernet driver?

[1] http://dpdk.org/ml/archives/dev/2016-September/046705.html [2] http://www.mellanox.com/related-docs/user_manuals/Ethernet_A... [3] https://github.com/snabbco/snabb/blob/mellanox/src/apps/mell...

grive · on Oct 24, 2016

> DPDK is taking the industry in the wrong direction, and application developers should fight back.

DPDK is doing the exact same work you did, make hardware vendor release their driver API and abstract it away so that Application developer can stay independent from it.

You "successfully lobbyied" for one API to be released. Now do that for any number of hardware, NICs versions, and in the end you will have to release a generic API, which is effectively a new DPDK.

Completely independent applications will only go so far. You are left with a vendor lock-in with a very high upfront cost if you ever need to evolve your hardware.

lukego · on Oct 24, 2016

I understand your perspective. If you are satisfied with using a vendor-provided software stack to interface with hardware then you are well catered for by DPDK and do not have to care what is under the hood.

I feel that the hardware-software interface is fundamental and that vendors should not control the software. I see an analogy to CPUs. I am really happy that CPU vendors document their instruction sets and support independent compiler developers. I would be disappointed if they started keeping their instruction sets confidential, available only under NDA, and told everybody to just use their LLVM backend without understanding it.

grive · on Oct 24, 2016

That is effectively the case. See for example DDIO with Intel which can only be enabled for specific devices with full cooperation between Intel and this particular vendor.

You cannot compete with a DDIO-enabled device, which of course all Intel devices are.

See also the Intel multibuffer crypto library, which was specialized and timed for Intel CPUs. No one else could write at this level of optimization, because we do not have the internal design and simulator that Intel work with.

So yeah, you are talking with sophisticated hardware which will have firmware blobs and undocumented features. If you only rely on general instructions sets you will only get so far. When we are talking about ns of latency and these level of bandwidth, they make the difference between several stacks.

The push for smart-NICs will increasingly blur the line between soft and hard layer. We can either direct our efforts so as to avoid rewriting an abstraction layer upon it or do so for each vendor-specific API (OFED is but one example, there will be others).

lukego · on Oct 24, 2016

I will respectfully disagree :).

We have taken Intel's reference code (https://github.com/lukego/intel-ipsec/blob/master/code/avx2/...) for high-speed AES-GCM encryption and used DynASM (https://luajit.org/dynasm.html) to refactor it as a much smaller program (https://github.com/snabbco/snabb/blob/master/src/lib/ipsec/a...). I see this as highly worthwhile: we are working on making the software ecosystem simpler and tighter just because we are hackers, while Intel are working primarily on selling CPUs and whatever is best for their bottom line.

I disagree with this characterization of DDIO but I don't think Hacker News comments is the best venue for such low-level discussions. Hope to chat with you about it in some more suitable forum some time :) that would be fun.

JoachimSchipper · on Oct 25, 2016

FWIW, I would be quite interested in your view on DDIO - anything you can link?

lukego · on Oct 25, 2016

Intel DDIO FAQ: http://www.intel.com/content/dam/www/public/us/en/documents/...

My understanding is that DDIO is an internal feature of the processor and works transparently with all PCI devices. Basically Intel extended the processor "uncore" to serve PCIe DMA requests via the L3 cache rather than directly to memory.

drewg123 · on Oct 24, 2016

I think you're confusing DDIO with DCA. DDIO is Intel's mechanism of allocating L3 cache ways to DMA, and works for any vendor's card. DCA is an older set of steering hints that cause per-TLP steering hint flags to influence whether or not a DMA write ends up in the CPU cache. DCA is highly targeted, and much more effective in realistic workloads because you can be smart, and cache just descriptors and packet header DMA writes (eg, metadata). With DDIO, you end up caching everything, and with a limited number of cache ways, you end up often caching nothing, because later DMAs push earlier ones out of cache before the host can use the data.

At a previous employer, we figured out the DCA steering hits and implemented it in our NIC. Thankfully enough of our PCIe implementation was programmable to allow us to do this.

hueving · on Oct 24, 2016

Don't you have a financial interest in snabb succeeding and dpdk failing?

lukego · on Oct 24, 2016

No. The way to make the most money is to align yourself with vendors. Mine is a technical interest in making networking software simpler.

Senji · on Oct 24, 2016

You're writing your driver in LUA? Am I missing something?

mrottenkolber · on Oct 24, 2016

LuaJIT is a very good tracing JIT compiler and DynASM Lua mode lets us embed assembly in Lua where necessary.

rdtsc · on Oct 24, 2016

Depending what you are doing, with latencies (or throughput) in that range, sticking a black box library in there right away might not be the best idea always. Doing what the author did is also a way to learn how things work. Eventually the library might be the answer, but if I had to do what they did, I would do it by hand first as well.

Jweb_Guru · on Oct 24, 2016

I doubt there's a single real use case out there for whom DPDK isn't fast enough, but custom hardware isn't warranted.

mianos · on Oct 24, 2016

Plus it also supports even lower level drivers for a bunch of cards (some are VM virtualised, such as the Intel em), as well as AF_PACKET, oh, and pcap.

qb45 · on Oct 24, 2016

And OP reimplemented the subset he actually needed in half KLOC.

_pmf_ · on Oct 24, 2016

Because sometimes you do not want to replace one abstraction with another if the very point is to remove a layer.

bogomipz · on Oct 24, 2016

Can you elaborate on what you mean by the Intel NICs/drivers being a hard prerequisite here?

revelation · on Oct 24, 2016

Lots of the low latency options require driver and Hardware cooperation, busy polling, BQL, essentially all of the ethtool options, even the IRQ affinity.

Intel has been a driving force behind many kernel networking improvements but they naturally don't care for other manufacturers, so they implement a little bit of kernel infrastructure and put the rest into their drivers.

bogomipz · on Oct 24, 2016

Understood, agreed. Thanks for the clarification.

kristianov · on Oct 24, 2016

This. The author does not even address why they are not using DPDK. Are they using AMD server CPUs?