Hacker News new | past | comments | ask | show | jobs | submit login

In the context of coder, the userspace TCP overhead should be negligible. Based on https://gvisor.dev/docs/architecture_guide/performance/ and assuming runc is mostly just using the regular kernel networking stack (I think it does, since it mostly just does syscall filtering?) it should be at most a 30% direct TCP performance hit. But in a real application you typically only spend a negligible amount of total time in the TCP stack - the client code, total e2e latency, and server code corresponding to a particular packet will take much more time.

You'll note their node/ruby benchmarks showed a substantially bigger performance hit. That's because the other gvisor sandboxing functionality (general syscall + file I/O) has more of an impact on performance, but also because these are network-processing bound applications (rare) that were still reaching high QPS in absolute terms for their perspective runtimes (do you know many real-world node apps doing 350qps-800qps per instance?).

Because coder is not likely to be bottlenecked by CPU availability for networking, the resource overhead should be inconsequential, and what's really important is the impact on user latency. But that's something likely on the order of 1ms for a roundtrip that is already spending probably 30-50ms at best in transit between client and server (given that coder's server would be running in a datacenter with clients at home or the office), plus the actual application logic overhead which is at best 10ms. And that's very similar to a lot of gvisor netstack use cases which is why it's not as big of a deal as you think it is.

TLDR: For the stuff you'd actually care about (roundtrip latency) in the coder usecase the perf hit of using gvisor netstack should be like 2% at most, and most likely much less. Either way it's small enough to be imperceivable to the actual human using the client.




TCP overhead is part of the story. Theres 20-40x overhead in syscalls, 20% running a tensorflow project end to end, 50% fewer RPS in redis, etc.


We are still talking about people using runsc/runc. That's not what `coder` is doing. All they did was poach a (popular) networking library from the gVisor codebase. None of this benchmarking has anything to do with their product.


I've already accepted this whole thread is a digression, but I keep getting pulled in. Calling out my dislike for Gvisor on a thread lauding a 5x tcp performance they found in it felt on topic to me at the time.


Ok. I'm only triggered by two things:

1. An argument that a tool using netstack is in any way tainted with gVisor's runtime costs.

2. An argument that shared-kernel multitenant is tenable and thus gVisor addresses no meaningful security concerns.


Not gonna lie am also getting 200% triggered whenever he states gVisor Syscall costs lol


Are they even using runc/runsc?


At coder, no since "gVisor is a container runtime that reimplements the entire Linux ABI (syscalls) in Go, but we only need the networking for our purposes"

but gvisor was using full runsc for the networking benchmarks I linked, and IIUC runc's networking should be sufficiently similar to unsandboxed networking that I believe runsc<->runc network performance difference should approximate gvisor netstack<->vanilla kernel networking.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: