11. notice that there's a unicode rendering error ("'" for apostrophe) on kernel_initializer and bias_initializer default arguments in the documentation, and wonder why on earth for such a high-level API one would want to expose lora_rank as a first class construct. Also, 3 out of the 5 links in the "Used in the guide" links point to TF1 to TF2 migration articles - TF2 was released 5 years ago.
Yep in Netflix case they pack bare-metal instances with a very large amount of containers and oversubscribe them (similar to what Borg reports: hundreds of containers per VM is common), so there are always more runnable threads than CPUs and your runqueues fill up.
I'm curious as to the capacity of the bare metal hosts you operate such that you can oversubscribe CPU without exhausting memory first or forcing processes to swap (which leads to significantly worse latency than typical scheduling delays). My experience is that most machines end up being memory bound because modern software—especially Java workloads, which I know Netflix runs a lot of—can be profligate memory consumers.
Workloads tend to average out if you pack dozens or hundreds into one host. Some need more CPU and some need more memory, but some average ratio emerges ... I like 4GB/core.
Yep. In Netflix case each Titus host can run hundreds of containers per bare-metal instance at any given time. One advantage of running a multi-tenant platform like this is that you get better observability on multi-tenancy issues since you're doing the scheduling yourself and know who is collocated with who. It's much harder to debug noisy-neighbor issues when it's happening on the cloud provider side and your caches get thrashed by random other AWS customers.
One thing I was pitching internally when advocating for this platform is that when you have the scale to run it for the economics to make sense, you can reclaim some of AWS margins instead of having your cold tiny VMs subsidize other AWS customers higher perf. If you run the multi-tenant platform yourself, you can oversubscribe every app in a way that makes sense for your business and trade latency or throughput of software for $ on a per-container basis, so you can make much more granular and optimal decisions globally. VS having each team individually right-size their own app deployed on VMs and sharing CPU caches with randos.
I remember once at Netflix we investigated a weird latency issue on a random load balancer instance and got AWS involved: it turned out to be a noisy-neighbor on the underlying VM that gets chopped up into multiple customer-facing LB instances.
According to [1] Fargate is actually not using Firecracker, but probably something closer to a single container running in a single-tenant ec2 VM. If true, this makes VM boot-time optimizations and warm pooling even more important for such product.
Beyond "kernel programming is hard", there are a few other reasons why it made sense for us:
- observability & maintenance: much easier to implement and ship this type of changes in userspace than rolling out a kernel fork. We also built custom AB infra to be able to evaluate these optimizations.
- the kernel is really good at making reasonable decisions at high-frequency based on a limited amount of data and heuristics. But these decisions are far from optimal in all scenarios. In contrast in user-space we can make better decisions based on more data (or ML predictions), but do so less frequently.
It's a MoE model, so it offers a different memory/compute latency trade-off than standard dense models. Quoting the blog post:
> DBRX uses only 36 billion parameters at any given time. But the model itself is 132 billion parameters, letting you have your cake and eat it too in terms of speed (tokens/second) vs performance (quality).
Despite both being MoEs, thr architectures are different. DBRX has double the number of experts in the pool (16 vs 8 for Mixtral), and doubles the active experts (4 vs 2)
The guy who was a CEO of a top 10 digital services (?) company worldwide, Atos, and before that of the biggest French telecom (France Telecom/Orange). He was named 3 times in the top 100 CEOs by Harvard Business Review. He was a minister of economics, finances and industry for 2 years. Also has a few sci-fi books.
And for you, he became popular when he was named European Commissioner? Which company did he fine to become famous with his new position?
The only actual example I can think of is Schrems and he didn't fine anyone, he merely dismantled the shoddily constructed EU-US privacy law circumventions (Privacy Shield etc). And I had to look up his first name (Max) because I only knew his name from the court cases (notably Schrems II).
Heh, great illustration of how hollow the "politician acting for fame" accusation is: the only name you could think of is of an EFF-like activist, he's not a politician at all :D
Turns out I had him down as a "politician" because I conflated him with the Greens politician who was doing elaborate flow charts on Brexit on social media. Incidentally I also forgot his name too.
I saw he's only listed as a "lawyer" (not a political position or party membership) when looking up his full name but given that I know literally nothing else about him (the face doesn't even feel familiar) I left it at that.