What came before was regression. Which is to this day no 1 method if we want something interpretable, especially if we know which functions our variables follow. And self attention is very similar to correlation matrix. In a way neural networks are just bunch of regression models stacked on top of each other with some normalization and nonlinearity between them. It's cool however how closely it resembles biology.