Note that Magic: The Gathering is Turing Complete https://arxiv.org/abs/1904.098...

Note that Magic: The Gathering is Turing Complete

https://arxiv.org/abs/1904.09828

Something being Turing Complete just that means in principle it could be used to solve any computation problem. But this may require infinite memory or infinite time.

printf() is Turing complete as an other example.

https://www.ioccc.org/2020/carlini/index.html

The paper showed that Transformer with positional encodings and rational activation functions is Turing complete.

Rational activation functions with arbitrary precision make sure that you are in the smaller countable infinities, where floats run into that cardinality of the continuum problem.

While all nets that use attention are feed forward and thus effectively DAGs, they add in positional encodings to move it from well-founded to well ordered.

While those constraints allow the authors to make their claims in this paper, they also have serious implications for real world use as rational activation functions are not arbitrarily precise in physically realizable machines in finite time and you will need to find a well-ordering of your data or find a way to force one one it which is not a trivial task.

So while interesting, just as it was interesting when someone demonstrated sendmail configurations were Turing complete, it probably isn't as practical as you seem to think of it.

As attention is really runtime re-weighting and as feed forward networks are similar to DAGs it is not surprising to me that someone found a way to prove this, but just as I am not going to use the C preprocessor as a universal computation tool as it is also TC, I wouldn't hold your breath waiting for attention to be a universal computation tool either.