More

mochomocha · 2024-10-01T03:12:40 1727752360

11. notice that there's a unicode rendering error ("'" for apostrophe) on kernel_initializer and bias_initializer default arguments in the documentation, and wonder why on earth for such a high-level API one would want to expose lora_rank as a first class construct. Also, 3 out of the 5 links in the "Used in the guide" links point to TF1 to TF2 migration articles - TF2 was released 5 years ago.

mochomocha · 2024-09-12T00:39:33 1726101573

Yep in Netflix case they pack bare-metal instances with a very large amount of containers and oversubscribe them (similar to what Borg reports: hundreds of containers per VM is common), so there are always more runnable threads than CPUs and your runqueues fill up.

otterley · 2024-09-12T05:50:46 1726120246

I'm curious as to the capacity of the bare metal hosts you operate such that you can oversubscribe CPU without exhausting memory first or forcing processes to swap (which leads to significantly worse latency than typical scheduling delays). My experience is that most machines end up being memory bound because modern software—especially Java workloads, which I know Netflix runs a lot of—can be profligate memory consumers.

XorNot · 2024-09-12T08:09:07 1726128547

If you're min-maxing cost it seems doable? 1TB+ RAM servers aren't that expensive.

jeffbee · 2024-09-12T15:59:31 1726156771

Workloads tend to average out if you pack dozens or hundreds into one host. Some need more CPU and some need more memory, but some average ratio emerges ... I like 4GB/core.

mochomocha · 2024-09-12T00:27:14 1726100834

Yep. In Netflix case each Titus host can run hundreds of containers per bare-metal instance at any given time. One advantage of running a multi-tenant platform like this is that you get better observability on multi-tenancy issues since you're doing the scheduling yourself and know who is collocated with who. It's much harder to debug noisy-neighbor issues when it's happening on the cloud provider side and your caches get thrashed by random other AWS customers.

One thing I was pitching internally when advocating for this platform is that when you have the scale to run it for the economics to make sense, you can reclaim some of AWS margins instead of having your cold tiny VMs subsidize other AWS customers higher perf. If you run the multi-tenant platform yourself, you can oversubscribe every app in a way that makes sense for your business and trade latency or throughput of software for $ on a per-container basis, so you can make much more granular and optimal decisions globally. VS having each team individually right-size their own app deployed on VMs and sharing CPU caches with randos.

I remember once at Netflix we investigated a weird latency issue on a random load balancer instance and got AWS involved: it turned out to be a noisy-neighbor on the underlying VM that gets chopped up into multiple customer-facing LB instances.

wildfire · 2024-09-12T15:37:05 1726155425

Aside: Is titus still being developed?

GitHub repo says it was archived 2 years ago: https://github.com/Netflix/titus

mochomocha · 2024-06-19T20:24:17 1718828657

> Government is controlled by the highest bidder.

While this might be true for the governments you have personally experienced, this is far from being an aphorism.

mochomocha · 2024-05-24T03:15:59 1716520559

According to [1] Fargate is actually not using Firecracker, but probably something closer to a single container running in a single-tenant ec2 VM. If true, this makes VM boot-time optimizations and warm pooling even more important for such product.

[1]: https://justingarrison.com/blog/2024-02-08-fargate-is-not-fi...

mochomocha · 2024-04-10T21:52:57 1712785977

(I'm the author of the blog post)

Beyond "kernel programming is hard", there are a few other reasons why it made sense for us:

- observability & maintenance: much easier to implement and ship this type of changes in userspace than rolling out a kernel fork. We also built custom AB infra to be able to evaluate these optimizations.

- the kernel is really good at making reasonable decisions at high-frequency based on a limited amount of data and heuristics. But these decisions are far from optimal in all scenarios. In contrast in user-space we can make better decisions based on more data (or ML predictions), but do so less frequently.

mochomocha · 2024-03-27T14:20:22 1711549222

It's a MoE model, so it offers a different memory/compute latency trade-off than standard dense models. Quoting the blog post:

> DBRX uses only 36 billion parameters at any given time. But the model itself is 132 billion parameters, letting you have your cake and eat it too in terms of speed (tokens/second) vs performance (quality).

hexomancer · 2024-03-27T14:33:33 1711550013

Mixtral is also a MoE model, hence the name: mixtral.

sangnoir · 2024-03-27T15:42:32 1711554152

Despite both being MoEs, thr architectures are different. DBRX has double the number of experts in the pool (16 vs 8 for Mixtral), and doubles the active experts (4 vs 2)

mochomocha · 2024-03-20T13:55:13 1710942913

> Do list the names of famous bureaucrats that got famous because they fined rich companies.

Thierry Breton is another name that comes to mind.

sofixa · 2024-03-20T14:13:12 1710943992

The guy who was a CEO of a top 10 digital services (?) company worldwide, Atos, and before that of the biggest French telecom (France Telecom/Orange). He was named 3 times in the top 100 CEOs by Harvard Business Review. He was a minister of economics, finances and industry for 2 years. Also has a few sci-fi books.

And for you, he became popular when he was named European Commissioner? Which company did he fine to become famous with his new position?

x0x0 · 2024-03-20T19:26:15 1710962775

The dude who can't submit a post to Twitter/bsky without including a glamor shot of himself?

hnbad · 2024-03-20T14:15:09 1710944109

Never heard of her.

The only actual example I can think of is Schrems and he didn't fine anyone, he merely dismantled the shoddily constructed EU-US privacy law circumventions (Privacy Shield etc). And I had to look up his first name (Max) because I only knew his name from the court cases (notably Schrems II).

SiempreViernes · 2024-03-20T14:19:35 1710944375

Heh, great illustration of how hollow the "politician acting for fame" accusation is: the only name you could think of is of an EFF-like activist, he's not a politician at all :D

hnbad · 2024-03-21T20:30:17 1711053017

Turns out I had him down as a "politician" because I conflated him with the Greens politician who was doing elaborate flow charts on Brexit on social media. Incidentally I also forgot his name too.

I saw he's only listed as a "lawyer" (not a political position or party membership) when looking up his full name but given that I know literally nothing else about him (the face doesn't even feel familiar) I left it at that.

rsynnott · 2024-03-20T14:42:52 1710945772

Also, he's not a bureaucrat or politician.

mochomocha · 2024-02-21T15:19:18 1708528758

The technical report (linked in the 2nd paragraph of the blog post) mentions it, and compares against it: https://storage.googleapis.com/deepmind-media/gemma/gemma-re...

mochomocha · 2024-02-09T20:45:21 1707511521

Yes, that's part of the AV1 specs actually. See https://norkin.org/pdf/DCC_2018_AV1_film_grain.pdf

Andrey (who works for Netflix) drove the effort. Chatted with him about it.