He's questioning the statement: "I don't think [the trick] ever did very much", ...

danielmarkbruce · on July 25, 2023

Is he? A surface level reading suggests he's asking "how would you know".. and the answer is... by looking at the parameters. People do that.

>> because no one has yet looked at whether the trick helps reducing outliers in very large models

Given a softmax version doing exactly as the blog post says is baked into a google library (see this thread), and you can set it as a parameter in a pytorch model (see this thread), this claim seems off. "Let's try X, oh, X doesn't do much, let's not write a paper about it" is extremely common for many X.

tudorw · on July 25, 2023

This would seem like a really good argument as to why failures should be written up, otherwise where is the list of what has been tried before?

danielmarkbruce · on July 25, 2023

Yup, it is. But it isn't going to happen.