Yep! Although MoE use several "expert" linear layers within a single network, and generally the "routing" is not based on high-level semantics, but token specialization, as disuccussed by Fuzhao Xue, author of OpenMoE, in one of our reading groups: https://www.youtube.com/watch?v=k3QOpJA0A0Q&t=1547s