In my testing, both Llama 3 and its abliterated (uncensored) variant from[0] almost always remarked more or less directly that they see the joke in the phrase, so either they've seen the other meaning in training, or inferred it.
Oh I agree it probably inferred the joke. I was actually more surprised that it knew the real meaning of the phrase because I as a human did not, until I looked it up and saw how common it is.
Please use the word ablated instead. That article's title is not using a real word. I'm assuming it's the author's English issue, since they called the model "helpfull" instead of "helpful".
--
[0] - https://news.ycombinator.com/item?id=40665721