For example if you have a model that consists of more than (say) 2000 classes - do you train a single mega model? or multiple smaller models (for speed/efficiency) and combine the outputs? Are there any best practices that worked for you in production?