Hacker News new | past | comments | ask | show | jobs | submit login

Which means Codestral Mamba and DeepSeek both lead four benchmarks. Kinda takes the air out the announcement a bit.



It should be corrected but the interesting aspect of this release is the architecture. To stay competitive while only needing linear inference time and supporting 256k context is pretty neat.


THIS. People don't realize the importance of Mamba competing on par with transformers.


Linear attention is terrible for chatbot style request-response applications, but if you're giving the model the prompt and then let it scan the codebase and fill in the middle, then linear attention should work pretty decently. The performance benefit should also have a much bigger impact, since you're reprocessing the same code over and over again.


They're in roughly the same class but totally different architectures

Deepseek uses a 4k sliding window compared to Codestral Mamba's 256k+ tokens




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: