Improving Network Performance with Linux Flowtables

darksim905 · 2024-03-04T21:07:38 1709586458

Showing the netfilter with the OSI layer on the left, after all these years is absolutely clutch and as an educator, I'm thankful the author made such a thing. It's beautiful.

I feel like from an abstraction standpoint, a lot of these concepts get lost when you transition to Windows and in either direction, these pre/post chains never quite made sense to me on the surface. Though, I'm positive it's because I'm not a developer or sysadmin in linux daily. I imagine there's some fascinating stuff you can do.

chatmasta · 2024-03-04T22:09:08 1709590148

This diagram [0] from Wikipedia has always been my favorite way of understanding the flow of a packet through the kernel.

[0] https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilte...

furkansahin · 2024-03-05T09:55:31 1709632531

Hey! Author here :) Thanks for the kind words but I have to direct the good words to another blog post that a friend named Andrej Stender wrote. I used the image with his permission. Check out his blog post, it's where I learned about Flowtables from https://thermalcircle.de/doku.php?id=blog:linux:flowtables_1....

rany_ · 2024-03-04T19:23:19 1709580199

> In Ubicloud’s case, enabling flowtables just took seven lines of code!

Could have been six lines by combining these two lines:

    ip protocol tcp counter flow offload @ubi_flowtable
    ip protocol udp counter flow offload @ubi_flowtable

into:

    meta l4proto { tcp, udp } flow offload @ubi_flowtable

Also, their changes only work for IPv4. The above would work for both IPv4 and IPv6.

fdr · 2024-03-04T21:31:17 1709587877

That's a good one. Ubicloud has decent IPv6 support as we use it for our underlay network (in fact...for a while, as a prototype, it only supported IP6), but we missed a trick here.

https://github.com/ubicloud/ubicloud/pull/1322

rany_ · 2024-03-04T22:16:09 1709590569

You're most welcome, happy to help!

furkansahin · 2024-03-04T17:55:10 1709574910

Hello from the author, here! I wanted to explain that we use Nftables for NATing, Firewall Rules and some spoofing avoidance tasks at the moment. Enabling Flowtables benefit the full networking stack for any connections. Give it a try!

Also, happy to answer if there are any questions.

guerby · 2024-03-04T17:49:26 1709574566

Another article about flowtables:

https://firewalld.org/2023/05/nftables-flowtable

Documentation:

https://wiki.nftables.org/wiki-nftables/index.php/Flowtables

dugite-code · 2024-03-05T05:47:08 1709617628

Gave this a shot on my Home server that's running a bunch of docker containers. It certainly feels like it's improved performance over the network. Next step is to run a bunch of benchmarks.

binkHN · 2024-03-04T18:14:27 1709576067

If more performance is better, why isn't this the default in the Linux networking stack? What are the drawbacks of using this and are there security implications?

furkansahin · 2024-03-04T18:22:36 1709576556

There are a couple of important points to consider when using Flowtables. Depending on the traffic characteristics, Flowtables may also drop the network performance. It also depends on the amount of rules and their configuration. For example, if we had much much more rules in nftables file for so many different operations (think of it like adding a new rule per IP address), Flowtables would hurt the performance in Ubicloud. They go hand in hand with how the overall network is configured. In our case at Ubicloud, it helps us, but we have a fairly simple Nftables definition as you can see a big part of it here https://github.com/ubicloud/ubicloud/blob/main/prog/vnet/upd...

chatmasta · 2024-03-04T19:44:55 1709581495

I wonder if there are any security implications to consider, particularly in a multi-tenant environment, when caching routing information for the "same" connection.

furkansahin · 2024-03-04T19:50:52 1709581852

In this case, no. The reason is, we setup separate network namespaces per VM and the flowtables are also created separately per namespace.

BikiniPrince · 2024-03-06T17:16:21 1709745381

Sorta, you are using hardware offload at a level above the vm. There are a lot of situations in which traffic can interact poorly with that nic. You won’t really be able to identify what traffic resulted in the nic responding because of a bug. I’m assuming you are not using sr-iov vf devices if you are managing offload at the parent. Probably for the best if it is that card.

fdr · 2024-03-04T18:42:03 1709577723

I think history is at least a part of this. The software flow table implementation (which ubicloud demonstrates here, full disclosure, I work there) offers some speedup, but the motive appears closely coupled with hardware flow table offloading: https://lwn.net/Articles/738214/

nolist_policy · 2024-03-04T20:50:02 1709585402

nftables is very flexible and flowtables won't work (transparently) in all configurations.

For example sending packets from a single connection over multiple links round-robin. The cache will remember only one link and route all packets over that link.

And packets in offloaded connections will bypass nftables rate/bandwidth limits and counters.

tux1968 · 2024-03-04T17:50:13 1709574613

A little more info:

https://www.kernel.org/doc/Documentation/networking/nf_flowt...

dboreham · 2024-03-04T17:49:47 1709574587

Nicely written article, and now I know about Ubicloud.

callamdelaney · 2024-03-04T23:41:58 1709595718

Presumably the best way is to skip the kernel all together.. are there any decent ways to run the network stack in userspace yet?

-edit- I know userspace networking may not be relevant in the authors case but it is of interest to me.

duped · 2024-03-04T23:52:40 1709596360

Here's a good article about your options.

https://blog.cloudflare.com/kernel-bypass/

hujun · 2024-03-05T01:15:03 1709601303

the latest hot one is eBPF/xdp, https://en.wikipedia.org/wiki/Express_Data_Path; however I think in most cases skipping kernel might not be such a great idea, a lot of kernel features are there for a reason (e.g. like routing/arp lookup, fragmentation/reassembly), if you skip them, it means you have to implement those features in user-space...

_factor · 2024-03-06T02:51:31 1709693491

Maybe they should have been in user-space to begin with? Micro-service kernel, here we come.

papichulo2023 · 2024-03-05T04:15:37 1709612137

Podman and netavark I guess

throwawaaarrgh · 2024-03-04T21:35:41 1709588141

> an opensource alternative to AWS

Just putting out there that OpenStack is open source, already exists, very feature complete, and there are even hosting providers that will give you your own OpenStack control plane and only bill you for the resources you use. Only one provider in the US, but several in Europe.

No need to deploy and manage your own clusters on bare metal. They do it all for you and just give you an API, same as AWS. Way better than managing your own stack. The fact that more providers aren't doing this kind of blows my mind. But they probably prefer the proprietary walled garden, easier to keep customers from moving.

einpoklum · 2024-03-04T22:57:12 1709593032

Should I really read about these tables, or will the Linux kernel replace them with yet another set of tables in a few years, with almost-but-not-quite-the-same semantics, a different command-line tool, different column order etc. ?

-- disgruntled user

smashed · 2024-03-05T03:14:01 1709608441

I was confused with this as well and the documentation is not always clear about it.

Nftables and the netfilter project is the firewall implementation in Linux.

The legacy and beloved iptables format is fully replaced nowadays by nftables. You don't have to learn anything new because the iptables command line is just a compatibility layer on top of nftables with full compatibility. When you insert iptables rules, they get translated to nftables seamlessly. This has been the default on all major distros for years.

Converting to nftables has a few neat advantages such as much improved set/map and verdict tables support, unified IPv4, IPv6 and bridge rules, etc. But you don't have to. Everything old still works.

Flow tables is an optional feature of netfilter, I think originally meant to interface with hardware NAT accelerators in cheap routers, but it also has a pure software default implementation that can speed things up in some cases. That's what is being discussed in this article.

You use nftables to define and hook into flow tables. They work together, not against each other.