For what it’s worth, someone pointed out that PyTorch has an optional workaround for it in the Multihead Attention API. But yes, I had to skip over 200 comments ranting off-topic that was mildly annoying (to me).

