Hacker News new | past | comments | ask | show | jobs | submit login

Maybe not so fast. Other users are reporting that it’s not actually running properly in environments with limited RAM. The reduced memory usage might be more of a reporting misunderstanding, not an actual reduction in memory usage.



It will run, just will have to reread the model for every new token.


with nvme gen 4 ssds this might not be that huge of an issue, and for sure much cheaper than investing in ram


I don't believe the consumer ones actually have sustained sequential read speed to saturate Gen 4.


Gen 5 pcie is ~4GB/s per lane, AMD Genoa chips have 128 such lanes. That means on the order of 500GB/s aggregate throughput, which is comparable to the aggregate theoretical throughput of the 12 channel DDR5 RAM of the Genoa CPUs.

In other words, with enough data interleaving between enough NVME SSDs, you should have SSD throughput of the same order of magnitude as the system RAM.

The weights are static, so it’s just reads.


sequential reads are the best case scenario for ssds. writes degrade, as they're first committed to SLC cache before being written to slower tlc/qlc.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: