Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
sdenton4
6 months ago
|
parent
|
context
|
favorite
| on:
Mixture-of-Depths: Dynamically allocating compute ...
This is actually how EfficientNet trains, using random truncation of the network during training. It does just fine... The game is that each layer needs to get as close as it can to good output, improving in the previous activation quality.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: