I do not understand what exactly you are complaining about.
It is true that the "Attention Is All You Need"[0] paper requires some level of basic understanding/experience of ML architectures and engineering, but the target audience for the paper certainly understands it. It would be unnecessary if every academic paper startet with a first principles explanation of what is considered common knowledge in the field.
> wouldn't surprise me if you go to try it and its wrong, and none of the real problems have been solved
But obviously attention based transformer architectures are solving real world problems (by being better than previous architectures in real world applications).
It is true that the "Attention Is All You Need"[0] paper requires some level of basic understanding/experience of ML architectures and engineering, but the target audience for the paper certainly understands it. It would be unnecessary if every academic paper startet with a first principles explanation of what is considered common knowledge in the field.
> wouldn't surprise me if you go to try it and its wrong, and none of the real problems have been solved
But obviously attention based transformer architectures are solving real world problems (by being better than previous architectures in real world applications).
[0] https://arxiv.org/abs/1706.03762