Hacker News new | past | comments | ask | show | jobs | submit login

Another element is that Mamba required a very custom implementation down to custom fused kernels which I expect would need to be implemented in deepspeed or the equivalent library for a larger training run spanning thousands of GPUs.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: