That doesn't clarify it for me. The same parameters are being used for every layer for every token. Yes, there is this differentiable lookup in attention like in MoE - but routing is about more than just differentiable lookup, it is about selecting on parameters not state.