Hacker News new | past | comments | ask | show | jobs | submit login

For the most part this post was easy to read, and I could feel the collective excitement of the team. I came away feeling like I'd learned something and ready to try it myself. The only time the post gets a little fuzzy is "...store the quantized parameters in a selectable data type, where that storage data type is the same data type as the “computation type” of the mode". I assume "selectable datatype" is the float size of the quantization?



We've got a technical post with all the juicy details coming next week. But that bit refers to packing the 4-bit weights into a type FSDP is happy to shard (like float16 or float32) which matches the other non-quantized bits of the model. This way FSDP will happily wrap and shard all the parameters as if they were just normal floats.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: