Hacker News new | past | comments | ask | show | jobs | submit login

it's twice the size of mixtral and barely beats it.



It's a MoE model, so it offers a different memory/compute latency trade-off than standard dense models. Quoting the blog post:

> DBRX uses only 36 billion parameters at any given time. But the model itself is 132 billion parameters, letting you have your cake and eat it too in terms of speed (tokens/second) vs performance (quality).


Mixtral is also a MoE model, hence the name: mixtral.


Despite both being MoEs, thr architectures are different. DBRX has double the number of experts in the pool (16 vs 8 for Mixtral), and doubles the active experts (4 vs 2)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: