Programable logic on the die sounds like a great thing in principle, but the pla...

srcmap · on March 15, 2016

One can program the NIC to DMA packets directly info the address space allot for FPGA. Once setup, the FPGA should be able to get hold the packets and start processing completely without a single CPU cycle use on data plane.

deadgrey19 · on March 15, 2016

Sure. This is a possibility, although it is a bit round about and there would be an interesting song and dance in the NIC driver. NICs typically are told where to DMA to using descriptor tables programmed into the NIC by the driver. To do this truly without CPU intervention, you would need to write a hardware driver in the FPGA to program the NICs descriptor tables (can't even imagine what a nightmare that would be). Otherwise, you would have to have the CPU involved in setting up and negotiating transfers between the NIC and FPGA and a second driver between the FPGA and software. It's pretty messy either way. And given the proliferation of cheap FPGA enabled NIC's it seems like a non-starter. If the FPGA transceivers are broken out directly, then a simple adapter board would allow the FPGA to talk directly the network and/or memory device.

gricardo99 · on March 15, 2016

> you would have to have the CPU involved in setting up and negotiating transfers between the NIC and FPGA and a second driver between the FPGA and software

Plenty of "kernel bypass" and RDMA type functions use shared/user-space memory for "zero-copy" (in reality one copy), operations between NIC and software. If a similar scheme can be used with the FPGA then it would not have too much overhead. I agree, not as direct/efficient as having FPGA serdes I/O go directly to some SPF+/network transceiver, but then you'd also be taking up valuable FPGA gate capacity to run NIC PHY/MAC and standard L2/L3 processing functions that you get from a NIC.

deadgrey19 · on March 15, 2016

RDMA/kernel bypass NICs work by mapping chunks of RAM and then automatically DMA'ing packets into those chunks. Again, it would be a pretty round about way to give the FPGA access to packets to copy data to RAM, then copy down to the FPGA, then copy back up to RAM. Much simpler/better to let the data stream through the FPGA from/to the wire. In addition, the PHY/MAC layers these days are pretty thin for Ethernet style devices and modern FPGA's are by comparison huge. I'm not saying it can't be done, I'm just saying it seems sub-optimal when FPGA's already have a ton of I/O resources and are already used as NICs. The question as to wether these resources are exposed to the outside world is the salient one.

ShinyCyril · on March 15, 2016

These slides mention the possibility of using PCIe or QPI to attach to the CPU: http://www.ece.cmu.edu/~calcm/carl/lib/exe/fetch.php?media=c....

deadgrey19 · on March 15, 2016

I think I may not have been clear enough. I think we can take it as a given that there will be some kind of high-speed interconnect between the CPU and the FPGA (as you say, PCIe or QPI or whatever). But this is not the I/O problem I'm worried about. What I'm wondering is how the FPGA gets its access to the outside world. Does everything have to go through the CPU (which would be slow) or would the FPGA be able to use it's own multi-gigabit transceivers to talk directly to I/O devices (like NICs or HMC or Flash).

I always imagined that the best use of FPGA's in systems like this would as an I/O coprocessor. If the only way to get to the FPGA is via the CPU, then most (all?) of the benefit is lost.

deadgrey19 · on March 15, 2016

Although having taken a look at those slides, it would appear that the FPGA is given it's own PCIe bus connectivity to the outside world. That could be an interesting way to interact with the outside world.

virtuallynathan · on March 15, 2016

The Arria 10 GX can do up to 16x 30Gbps transceivers and 96x 17Gbps transceivers - I can't imagine they'd turn off all of these, since that is the selling point.

deadgrey19 · on March 15, 2016

The question is where are they broken out? If at all? A bunch will be used for the QPI/KPI interface, but how many will get broken out to play with in the real world? If the slides are correct, 8x will be broken out as PCIe so that's something I guess. But not very much. Which is kind of the point I'm making. FPGA's are great I/O processor devices, but if you can't access the I/O, then that use case is toast.

justinclift · on March 15, 2016

Infiniband as a connection technology would be interesting (to me), but Intel has been developing something called Omni-Path, which seem to be their version of a successor for it.

Kind of wondering if these will work with Intel's Omni-Path, and if so, what the shape of that makes things...

In theory, could be very interesting. Reality though, we get to find out. :)

juicenx · on March 15, 2016

Check out Netronome. www.netronome.com

They do exactly this in a programmable PCIe card and custom ASIC.

deadgrey19 · on March 15, 2016

I have used/programmed their cards. It was the worst 3 months of my life. I will never willingly touch their stuff again.

Buggy compiler implementing a superset of a subset of C89 with totally crazy macro extensions, and bizarre locality properties (e.g manually declaring if a variable is in a register or in ram), bad impossible to decipher (machine generated!) "documentation", 3 (!!) different and incompatible "standard" libraries each implementing different sets of features, 2 of which were written in assembler and inaccessible from C directly and almost no debugging tooling. e.g I had to write my own locking library because there wasn't one. What a nightmare.