I like your description because it's relatively succinct and intuitively suggest...

tylerneylon on July 25, 2023 | parent | context | favorite | on: Attention Is Off By One

I like your description because it's relatively succinct and intuitively suggests why the modified softmax can help the model handle edge cases. It's nice to ask: How could the model realistically learn to correctly handle situation X?