Going from 25% top-one error to 22% top-one error is a massive jump on ImageNet ...

Teodolfo on Jan 18, 2023 | parent | context | favorite | on: A stack of feed-forward layers does surprisingly w...

Going from 25% top-one error to 22% top-one error is a massive jump on ImageNet and very meaningful in a lot of applications. That said, there is no reason to believe attention-based models are the only way forward on image classification. The humble ResNet-50 can get near 22% top-1 error when trained properly.

Whether we need attention or not is a more interesting question on seq2seq models on text data.

amelius on Jan 18, 2023 [–]

What top-1 error do humans get on this problem?