This is an article about "S4" that uses "Structured State Spaces". https://srush...

This is an article about "S4" that uses "Structured State Spaces".

https://srush.github.io/annotated-s4/

The matrices that make up the state space (A, B and C) are constant in S4. This allowed them to represent some of the math operations as a convolution (which can be parallelized).

The difference between S4 and Mamba is that these matrices are input-dependent in Mamba. Plus they add in some CUDA stuff ("parallel scan") to make it faster to compute on a GPU even if these matrices are not constant.

Yannic Kilcher's video on Mamba might also be a good resource: https://youtu.be/9dSkvxS2EB0