There are a thousand papers out there making minor tweaks to the transformer architecture. 99% of them are also worthless and forgotten.
That's precisely what he shared this for, though. So someone willing to train a model with this tweak tries it.
There are a thousand papers out there making minor tweaks to the transformer architecture. 99% of them are also worthless and forgotten.