This is actually how EfficientNet trains, using random truncation of the network...

sdenton4 6 months ago | parent | context | favorite | on: Mixture-of-Depths: Dynamically allocating compute ...

This is actually how EfficientNet trains, using random truncation of the network during training. It does just fine... The game is that each layer needs to get as close as it can to good output, improving in the previous activation quality.