Am not expert, do you have some links about this? i.e. a neural net construction that outperforms a transformer model of the same size.
https://arxiv.org/abs/2310.16764
Am not expert, do you have some links about this? i.e. a neural net construction that outperforms a transformer model of the same size.