Hacker News new | past | comments | ask | show | jobs | submit | steeve's comments login

I mean, we (zml) clocked MI300X ($20k) at +30% than H100 ($30k).

So…


That was then. Now it's about MI325 vs. B100.


What about power consumption? edit: My understanding from about a year ago is that AMD and NVDA's chips were priced similarly in terms of performance per watt.


You can look us up at https://github.com/zml/zml, we fix that.


Wait, looking at that link I don't see how it avoids downloading CUDA or ROCM. Do you use MLIR to compile to GPU without using the vendor provided tooling at all?


We do use ROCm and CUDA. Only we sandbox it with the model and download only the needed parts which are about 1/10th of the size.


Hi, we (ZML), fix that: https://github.com/zml/zml


Works out of the box on our MI300x. Fantastic work steeve!

https://x.com/HotAisle/status/1842245896085356949


This is pretty cool. Is there a document that shows which AMD drivers are supported out of the box?


We are in line with ROCm 6.2 support. We actually just opened a PR to bump to 6.2.2: https://github.com/zml/zml/pull/39


We (ZML) measured MI300X at 30% faster than H100. These are great chips!


pretty easy, usually the hardest part is figuring out what the python code is doing


The last one is so very true


Bazel is amazing and doing C++ with anything other is like going back to the stone age.

The Bazel team has done an amazing job, the VM is embedded and trimmed. It’s as easy and download and run.

And worst case you can invest in Buck2.


Yes, that’s how it works (pipeline parallelism)


Interesting. Let's do the math ...

Let's say the model has 50B parameters and 50 layers. That would mean about one billion values have to travel through the wifi for every generated token?

I wonder how much data that is in bytes and how long it takes to transfer them.


It's not the parameters that are sent, it's the layer outputs. That makes for a few thousands floats per token


Woops! I would have thought the number of neurons roughly equals the number of parameters, but you are right. The number of parameters is much higher.


The embedding size is only 8k so while the parameters are 70B. So it's a huge difference


It’s outrage and bots.

Fuck Musk.


1. Jon Oliver’s piece on Science reporting

2. Didier Raoult


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: