Hacker News new | past | comments | ask | show | jobs | submit login
Predictive CPU isolation of containers at Netflix (2019) (netflixtechblog.com)
80 points by Cwizard 7 months ago | hide | past | favorite | 21 comments



It seems like we are more and more getting away from OSes managing our resources

Runtimes/vms implement memory management, varius threading techniques and things like we see here

Maybe in the future we will entirely skip OS's overhead and run apps directly on HW and they will manager themselves more efficiently (their runtimes/vms like jvm clr)


That's definitely a trend we're moving towards for extremely high-performance software. General-purpose operating systems often don't sit at the right level of abstraction, and lack the flexibility for certain demanding workloads. Kernel-bypass networking is the gold standard for low-latency, high throughput networking. Serverless platforms often rely on userspace schedulers and userspace page fault handlers.

That's one of the reasons unikernels seem to be a promising way forward. It opens up a bunch of opportunities, including language-based safety, opportunities for compile-time optimizations, and just seems to mirror more closely how we wish to run & deploy modern applications (declarative, immutable and ideally with a bare minimum of dependencies).


Is it node that still limits all processes to 2gb or something by default? (I think their rationale was “it’s a v8 flag so we don’t touch it”)



Upside is that it makes sure your stuff can be deployed on 32-bit.


Does that come in handy often?


And yet pointer compression is turned off by default.


More like "kernel programming is hard, let's put fancier logic and RPC in userspace". Which sounds perfectly sane.


(I'm the author of the blog post)

Beyond "kernel programming is hard", there are a few other reasons why it made sense for us:

- observability & maintenance: much easier to implement and ship this type of changes in userspace than rolling out a kernel fork. We also built custom AB infra to be able to evaluate these optimizations.

- the kernel is really good at making reasonable decisions at high-frequency based on a limited amount of data and heuristics. But these decisions are far from optimal in all scenarios. In contrast in user-space we can make better decisions based on more data (or ML predictions), but do so less frequently.


Meh, not really? This seems more analogous to memory allocator optimization, where your libc malloc() is "optimized" to give adequate performance to all sorts of different allocation patterns, but you can do much better if you know a priori what your application's actual pattern will be. Just swap out "malloc()" with "the process scheduler" here.


Worth watching The Birth and Death of JavaScript for more information on this hypothetical future. In theory you could get rid of system call and virtual memory overhead to make something like JavaScript run at "fully native" speed. Because removing the OS-related overhead counters the loss for being JavaScript. This is only really a viable future for managed languages because the runtime would need to ensure memory safety, isolation between "processes", etc. which they mostly do already anyway.

https://www.destroyallsoftware.com/talks/the-birth-and-death...


Could build them on top of unikernels.


wash, rinse, repeat.


This is amazing, they use ML to predict utilization on the fly


Related:

Predictive CPU isolation of containers at Netflix using a MIP solver - https://news.ycombinator.com/item?id=21116565 - Sept 2019 (21 comments)

Predictive CPU Isolation of Containers at Netflix - https://news.ycombinator.com/item?id=20096699 - June 2019 (1 comment)



Also sched-ext which seems close to be mainlined and is already a default scheduler in CachyOS:

https://github.com/sched-ext/scx


Kind of an old article. It is pretty straight forward thing to do. If you spend enough time accurately load testing your environments you can dial in the container resources and shave thousands of dollars. Lots of places are too scared of under allocating. Limit and request exist for a reason. Limit is for surge and request is what is always guaranteed. It is okay to exceed your request as long as you balance add a scaling policy to balance out the surge. And be cautious with request and limit on memory not all applications benefit from this.


They're automatically predicting the limit _and_ figuring out binpacking into hyperthreaded CPUs and NUMA cores. K8s just pushes your supplied values down to the kernel, which is exactly what they're saying is inefficient.


It is indeed inefficient so this is more like a process lasso approach to the resource management?


If the number of servers needed for service A is proportional to the number of servers needed for service B-Z, then your whole cluster scales up and down together and you have a situation where the max cluster size is hit regularly instead of almost never. For private servers that’s a big problem. But if you’re a large enough customer for a cloud provider it can still be a problem.

You save money still, but you don’t solve your capacity problems by doing so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: