I don't know anything about Coder, but Gvisor proliferation is annoying. It's a boon for cloud providers, helping them find another way to get a large multiple performance decrease per dollar spent in exchange for questionable security benefits. And I'm seeing it everywhere now.
I don't understand - what do you suggest as an alternative to Gvisor?
> large multiple performance decrease per dollar spent
Gvisor helps you offer multi-tenant products which can be actually much cheaper to operate and offer to customers, especially when their usage is lower than a single VM would require. Also, a lot of applications won't see big performance hits from running under Gvisor depending on their resource requirements and perf bottlenecks.
> I don't understand - what do you suggest as an alternative to Gvisor?
Their performance documents you linked claim vs runc: 20-40x syscall overhead, half of redis' QPS, and a 20% increase in runtime in a sample tenserflow script. Also google "CloudRun slow" and "Digital Ocean Apps slow", both are Gvisor.
A decent while ago, I was the original author of that performance guide. I tried to lay out the set of performance trade-offs in an objective and realistic way. It is shocking to me that you’re spending so much time commenting on a few figures from there, ostensibly w/o reading it.
System call overhead does matter, but it’s not the ultimate measure of anything. If it were, gVisor with the KVM platform would be faster than native containers (looking at the runsc-kvm data point which you’ve ignored for an unknown reason). But it is obviously more complex than that alone. For example, let’s click down and ask — how is it even possible to be faster? The default docker seccomp profile itself installs an eBPF filter that slows system calls by 20x! (And this path does not apply within the guest context.) On that basis, should you start shouting that everyone should stop using Docker because of the system call overhead? I would hope not, because looking at any one figure in isolation is dumb — consider the overall application and architecture. Containers themselves have a cost (higher context switch time due to cgroup accounting, costs to devirtualize namespaces in many system calls, etc.) but it’s obviously worth it in most cases.
The redis case is called out as a worst case — the application itself does very little beyond dispatching I/O, so almost everything manifests as overhead. But if you’re doing something that has 20% overhead, you need hard security boundaries, and fine-grained multi-tenancy can lower costs by 80% it might make perfect sense. If something doesn’t work for you because your trade-offs are different, just don’t use it!
> it is shocking to me that you’re spending so much time commenting on a few figures from there
You give me too much credit! They were copy pastes to the same responder who responded to me in a few places in the thread. I did that to avoid spending too much time responding!
> because looking at any one figure in isolation is dumb
So the self-reported performance figures are bad, the are hundreds of web pages and support pages reporting slow performance and low startup time from their first hand experience, there are Google hosted documentation pages about how to improve app performance for cloudrun (probably the largest user and creators of Gvisor, can I assume they know how to run it?) including gems like "delete temporary files" and a blog post recommending "using global variables" (I'm not joking). And the accusation is "dumb" cherry-picking? Huh?
Also, if I'm not wrong CloudRun GCP's main (only? besides managed K8s) PaaS container runtime. Presenting it as a general container runtime with ultra fast scaling when people online are reporting 30 second startup times for basic python/node apps, is a joke. These tradeoffs should also be highlighted somewhere in these sales pages, but they're not.
This is the last I'm responding to this thread. Also my apologies to the Coder folks for going off topic like this.
I don’t think copy/pasting the same response everywhere is better.
IIRC, CloudRun has multiple modes of operation (fully-managed and in a K8s cluster) and different sandboxes for the fully-managed environment (VM-based and gVisor-based). Like everything, performance depends a lot of the specifics — for example, the network depends a lot more on the network path (e.g. are
you using a VPC connector?) than it does the specific sandbox or network stack (i.e. if you want to push 40gbps, spin up a dedicated GCE instance.) Similarly, the lack of a persistent disk is a design choice for multiple reasons — if you need a lot of non-tmpfs disk or persistent state, CloudRun might not be the right place for the service.
It sounds like you personally had a bad experience or hit a sharp edge, which sucks and I empathize — but I think you can just be concrete about that rather than projecting with system call times (I’d be happy to give you the reason gen1 sandbox would be slow for a typical heavy python app doing 20,000 stats on startup — and it’s real but not really system calls or anything you’re pointing at,… either way you could just turn on gen2 or use other products, e.g. GCE containers, GKE autopilot, etc.).
I’m not sure what’s wrong with advice re: optimizing for a serverless platform (like global variables). I don’t really think it would be sensible to recompute/rebuild application state on any serverless platform on any provider.
Are you referring to gVisor the container runtime, or gVisor/netstack, the TCP/IP stack? I see more uptick in netstack. I don't see proliferation of gVisor itself. "Security" is much more salient to gVisor than it is to netstack.
In the issue of abysmal performance on cloud-compute/PaaS Im talking about the container runtime (most Paas is gVisor or Firecracker, no?) cloudrun, DO, modal, etc.
But given this article is about improving gvisors userland tcp performance significantly, it seems like the netstack stuff causes major performance losses too.
A TCP/IP stack is not an "implementation of syscalls". The things most netstack users do with netstack have nothing to do with wanting to move the kernel into userland and everything to do with the fact that the kernel features they want to access are either privileged or (in a lot of IP routing cases) not available at all. Netstack (like any user-mode IP stack) allows programs to do things they couldn't otherwise do at all.
The gVisor/perf thing is a tendentious argument. You can have whatever opinion you like about whether running a platform under gVisor supervision is a good idea. But the post we're commenting on is obviously not about gVisor; it's about a library inside of gVisor that is probably a lot more popular than gVisor itself.
Interesting to dismiss it as such. The gvisor netstack is a (big) part of gvisor and this article is discussing how the performance of that component was, and could well still be, garbage.
These tools bring marginal capability and performance gains, shoved down peoples throat by manufacturing security paranoia. Oh an it all happens to cost you like 10x time, but look at the shiny capabilities, trust me it couldn't be done before! A netsec and infra peddlers wet dream.
> The gvisor netstack ... this article is discussing how the performance of that component was ... garbage.
The article and a related GitHub discussion (linked from TFA) points out that the default congestion algorithm (reno) wasn't good for long-distance (over Internet) workloads. The gvisor team never noticed it because they test/tune for in-datacenter usecases.
> These tools bring marginal capability and performance gains
I get your point (ex: app sandbox in Android ruins battery & perf, website sandbox on chrome wastes memory, etc). While 0-days continue to sell for millions, opsec are right to be skeptical about a very critical component (kernel) that runs on 50%+ of all servers & personal devices.
In the context of coder, the userspace TCP overhead should be negligible. Based on https://gvisor.dev/docs/architecture_guide/performance/ and assuming runc is mostly just using the regular kernel networking stack (I think it does, since it mostly just does syscall filtering?) it should be at most a 30% direct TCP performance hit. But in a real application you typically only spend a negligible amount of total time in the TCP stack - the client code, total e2e latency, and server code corresponding to a particular packet will take much more time.
You'll note their node/ruby benchmarks showed a substantially bigger performance hit. That's because the other gvisor sandboxing functionality (general syscall + file I/O) has more of an impact on performance, but also because these are network-processing bound applications (rare) that were still reaching high QPS in absolute terms for their perspective runtimes (do you know many real-world node apps doing 350qps-800qps per instance?).
Because coder is not likely to be bottlenecked by CPU availability for networking, the resource overhead should be inconsequential, and what's really important is the impact on user latency. But that's something likely on the order of 1ms for a roundtrip that is already spending probably 30-50ms at best in transit between client and server (given that coder's server would be running in a datacenter with clients at home or the office), plus the actual application logic overhead which is at best 10ms. And that's very similar to a lot of gvisor netstack use cases which is why it's not as big of a deal as you think it is.
TLDR: For the stuff you'd actually care about (roundtrip latency) in the coder usecase the perf hit of using gvisor netstack should be like 2% at most, and most likely much less. Either way it's small enough to be imperceivable to the actual human using the client.
We are still talking about people using runsc/runc. That's not what `coder` is doing. All they did was poach a (popular) networking library from the gVisor codebase. None of this benchmarking has anything to do with their product.
I've already accepted this whole thread is a digression, but I keep getting pulled in. Calling out my dislike for Gvisor on a thread lauding a 5x tcp performance they found in it felt on topic to me at the time.
At coder, no since "gVisor is a container runtime that reimplements the entire Linux ABI (syscalls) in Go, but we only need the networking for our purposes"
but gvisor was using full runsc for the networking benchmarks I linked, and IIUC runc's networking should be sufficiently similar to unsandboxed networking that I believe runsc<->runc network performance difference should approximate gvisor netstack<->vanilla kernel networking.
No. The providers that did so soundly used virtualization to accomplish this, and a big part of the appeal of K8s is having a much lightweight unit of scheduling than full virtualization. gVisor is a middle ground between full virtualization and shared-kernel multitenant (which has an abysmal security track record).
Virtualization, lxc, containers (and K8s), etc were solutions to "secure shared environments". And they have an order of magnitude lower performance hit than gvisor does (Google 'cloudrun python startup times' if you're curious on the real impact of this stuff).
Have we proven they're not secure and safe? Have we broken out of containers yet? Heroku was running LXC for years before docker, did they run into major security woes (actual curious)?
If "secured shared environments" is a more specific term meaning "multi user unix environment", I didn't intend to say that.
Though you already mentioned my whole thread is a bit off topic to this post (and I sorta agree) but then baited me with this comment after. I'm happy to drop it and wait for a Gvisor container runtime thread.
Containers are not compute environments, their runtimes are, and gvisor (runsc) is one implementation of that. Docker engine (~runc) is another. It has similar performance characteristics to gvisor afaict looking online (the minimum cold start times I'm seeing are 500ms which I've beat in gvisor), yet implements less security features.
If by virtualization you mean VMs, gvisor can be more performant than those based on my experience. For example, AWS claims a p0 coldstart time of ~500ms using Firecracker but I know firsthand that applications sandboxed by gvisor can be made to cold start in significantly less time (like less than half): https://catalog.workshops.aws/java-on-aws-lambda/en-US/03-sn..., and you should be able to confirm this yourself by using products that leverage Gvisor under the hood or with your own testing. I actually worked on this (using gvisor, but working on adjacent tech) for years...
I'll note that a lot of people are thinking about how to reduce sandbox overhead in multitenant PaaS and it's one of the things I want to eventually address in my own startup. But I think blindly hating on gvisor because of a nebulous dislike of overhead really is misplaced without considering its alternatives.
The charts you linked in the performance guide show a 30x syscall overhead in runsc vs runc (careful quit a few of the charts are logrithmic). That's insane! They also go on claim a 20% tensorflow workload difference.
> I want to eventually address in my own startup.
You worked on CloudRun and their performance is dogshit. Seriously google it theres like 100 stack overflow questions on the subject. It's common enough a query Google even suggests follow up questions like: "Why is cloud run so slow?".
Now your answer might be "avoid syscalls", "don't do anything on the file system (oh by the way your file system is memory mapped hehe)", "interpreters can be slow to load their code, sorry", "look at these charts its not as bad as you say", "tcp overhead is only 30%", etc but your next set of customers wont have the same vendor lock in you enjoyed at Google.
Then do the same query for "Digital Ocean Apps slow", also gvisor. And bam you'll have a long list of customers ready to use your better version! Perhaps Google and Digital Ocean will enlist your expertise (again).
Yes, we have proven that shared-kernel multitenant is unsafe. The best example (though there are many) is the `waitid` LPE; nobody's container lockdown configuration was blocking `waitid`, which is what you'd have had to do to prevent container code from compromising the kernel. The list of Linux LPEs is long, and syzkaller crashes longer stil.
If they are using multitenant Docker / containerd containers with no additional sandboxing, then yes, then it's only a matter of time and attacker interest before a cross-tenant compromise occurs.
There isn't realistic sandboxing you can do with shared-kernel multitenant general-workload runtimes. You can do shared-kernel with a language runtime, like V8 isolates. You can do it with WASM. But you can't do native binary Unix execution and count on sandboxing to fix the security issues, because there's a track record of local LPEs in benign system calls.
OpenVZ, Virtuozzo and friends definitely weren't secure the way gVisor or Firecracker are. You can still do that and some providers do, doesn't make it a good idea.