You will want to use hierarchical outputs in this case. Take a look at Hinton's 'Knowledge Distillation' paper.
You will want to use hierarchical outputs in this case. Take a look at Hinton's 'Knowledge Distillation' paper.