Hacker News new | past | comments | ask | show | jobs | submit login

Any idea on what are the main tricks used to achieve gains over fsdp?



The blog post seems to contain more details and the core ideas: https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-train...


Odd that they don’t expand on this:

In Yandex’s pre-trainings, the implementation of YaFSDP along with other memory optimization strategies resulted in a speed gain of 45%.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: