The benefits over regular transformers is that it is more efficient (does less operations), as the original transformer has a quadratic complexity in the number of input tokens.
The benefits over regular transformers is that it is more efficient (does less operations), as the original transformer has a quadratic complexity in the number of input tokens.