The title is a reference to the famous machine learning paper "Attention Is All You Need" which introduced the concept of transformers. Transformers have revolutionized how we process sequential data (i.e. natural language processing).
And recently, a paper titled Attention Is Not All You Need has made the rounds arguing that some of the claims made in the AIAYN paper may have been overstated. https://arxiv.org/abs/2103.03404
If you read the title, it only refers to the multi-head-attention part of BERT, excluding the feed forward and skip connections, hence calling it "Pure Attention".
> Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
This does not prove the original title was wrong, and this paper is not a counter, but an analysis of a submodule which helps better understanding transformers.
Also... While cute, I found the examples of storing and retrieving images of The Simpsons characters not very informative about what goes on in that weight matrix that stores patterns.
I think it's correct, in the same way that you can say "Rolling Stones is a great band". It's about the tech called "Hopfield Networks", not about any particular number of networks that are all you need.
Not doubting what is officially grammatically correct, but that still sounds really weird to me. Like with sports teams I would only ever say "The Patriots are a good team" or "New England is a good team". Not "The Patriots is a good team".
In any event, the authors definitely chose that title as a callback to the well known paper "Attention is all you need", which introduced Transformers. So that probably influenced their decision to use "is" instead of "are".
Consider ‘My team is a good team’ vs. ‘My team are a good team’.
I bet ‘is’ sounds better to you in this context, though ‘my team’ and ‘The Patriots’ are similar noun phrases that could refer to exactly the same thing.
The difference is that Patriots is a plural. Replace it with Manchester United and ‘is’ sounds good again.
Yeah it's definitely caused by the team name being plural, or at least sounding plural - I've never heard anyone say "The Red Sox is good" either. Regardless of what is technically grammatically correct I think real life usage has pretty much settled on that convention, at least in the US.
Something odd : While "The Red Sox are John's favorite team." seems more natural then "The Red Sox is John's favorite team.", phrasing it in the opposite order, "John's favorite team is the Red Sox." seems more natural then "John's favorite team are the Red Sox." .
This seems like a strange discrepancy. Why is this the case? Maybe it is because "favorite team" is clearly singular, and is closer in the sentence to the "is"/"are" then the plural indicating sound in "Red Sox". Or maybe it is just whichever comes first which determines how the "to be" is conjugated?
Hm, but what if instead of connecting a noun phrase (determiner phrase?) like "The Red Sox" to another noun phrase (determiner phrase) "John's favorite team", we instead connect it to an adjective?
"The Red Sox are singular.", "The Red Sox is singular.", "Singular is The Red Sox." "Singular are The Red Sox." . Well, the "[Adjective] is [noun]" is kind of an unusual thing to say unless one is trying to sound like one is quoting poetry or yoda or something, but to the degree that either of them sound ok, I think "Singular are The Red Sox." sounds better than "Singular is The Red Sox." . Though, in this case, there doesn't seem to be anything grammatically suggested by the adjective that the thing be in the singular case (maybe I shouldn't have used "singular" as the adjective..) .
Hm, what if instead of "John's favorite team [is/are] the Red Sox." , we instead look at "John's favorite [is/are] the Red Sox." ? In this case, it seems, less clear which is more natural? They seem about the same to me (but that might just be me, idk.) .
You're totally right I would definitely always say "my favorite team is", and probably also would say "my favorite is". I think the subject of the sentence is what determines it grammatically, also since that's what you say first it makes sense it would affect your choice of verb more.
I actually think this could extend to a lot of situations where the object is referring to a single group, not just plural-sounding proper nouns. Like if asked "what was your favorite zoo exhibit?", I would probably respond "my favorite was the giraffes" not "my favorite were the giraffes". I'm actually not even sure what the correct response would be technically though. "My favorites were the giraffes" implies multiple favorites, and "my favorite was the giraffe" makes it sound like the exhibit had a single giraffe. So it feels like subject/object have to mismatch then.
Quoting: "We introduce a new energy function and a corresponding new update rule which is guaranteed to converge to a local minimum of the energy function."
Is this a minimum in a local area or local in the range of some function? I could see perhaps that'd being an advantage if you happen to know that local part of the range
In contrast we're usually looking for global min/max say with annealing algorithms. How is local is better in the context of this paper than global?
I’ve seen a lot of efforts to add a notion of associative memory into neural networks. Have any exciting applications of such architectures been publicised?
Just some days ago researchers from Peking U and Microsoft published a paper[0] saying they can access "knowledge neurons" in pretrained embeddings that will enable "fact editing"[1].
Nice paper! I used Hopfield networks in the 1980s. I hope that I can clear a few hours of time this week to work through this. I admit that for machine learning, that I have fell into the “deep learning for everything pit” in the last six or seven years. Probably because DL is what I usually get paid for.
Further off-topic, but do people actually consider this to be beautiful design? Looks like a rendered markdown document with MathJax and green headers. Perfectly appropriate for the content of the post, but beautiful isn't the first word that comes to mind for me.
I really wish I could literally just dump LaTeX onto the web and be done with it. Everything I've tried either doesn't work (Pandoc is cute) properly / isn't 1:1, or does work but yields enormous amounts of html (pdf2htmlex).
I am fairly happy with [insert MD->Book tool of your choice], but sometimes I want citations and things like that.
1. Hopfield Networks are also known as "associative memory networks", a neural network model developed decades ago by a guy named Hopfield.
2. It's useful to plug these in somehow as layers in Deep Neural Networks today (particularly, in PyTorch).
I hate non-informative titles!