Hacker News new | past | comments | ask | show | jobs | submit login

Yep! Although MoE use several "expert" linear layers within a single network, and generally the "routing" is not based on high-level semantics, but token specialization, as disuccussed by Fuzhao Xue, author of OpenMoE, in one of our reading groups: https://www.youtube.com/watch?v=k3QOpJA0A0Q&t=1547s



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: