Hacker News new | past | comments | ask | show | jobs | submit login

My IQ is orders of magnitude lower than the authors of the paper, but I did my best to work through it anyway. I studied CE and have the basic control theory background and undergrad level discrete time systems intuition. It would take much additional studying to understand state space models enough to really parse this paper. But I tried anyway. Take my comment here with a big grain of salt.

The overall insight of Mamba is to solve a longstanding problem with state space models. They are good at compressing the input context, but the compression of input into a hidden state erases information needed to make use of the context effectively as Transformers do.

Their solution to this problem is to create what they call a selection mechanism. The mechanism is input-dependent, allowing the model to adjust its output at each step as the input changes. How they do this is by making a few of the state space variables input-dependent instead of input-invariant. They choose a few of the state space variables and attach linear layers and such to project the input onto the state space variable at each time step. The linear layers (etc) are obviously trained so that they know how to transform the input appropriately so that the model spits out useful output.

But making the state space variables input dependent creates a problem in terms of computation overhead. They fix the computation problem by designing a machine architecture-aware algorithm that makes the most of modern GPU memory architecture, avoiding moving things in and out of HBM as much as possible.

Tri Dao came up with Flash Attention, which is basically a way to use hardware more efficiently in a Transformer. So this is his jam 100%.

I know this doesn’t add much to understanding the paper, but hopefully it’s better than nothing.




Is this similar to subset selection with the concrete distribution?


I don’t know enough to answer your question, sadly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: