Operation Type|Mamba Complexity|Transformer Complexity
Training(per iteration)|O(L)|O(L^2)
Autoregressive Inference(per step)|O(T)|O(L)
Memory Requirements|O(C)|O(L)
Where: L stands for the sequence length. T denotes a fixed constant that accounts for compression and selection time in Mamba's autoregressive inference. C reflects the fixed size of the SSM (State Space Model) latent state in Mamba
Per: https://github.com/state-spaces/mamba/issues/196
Operation Type|Mamba Complexity|Transformer Complexity
Training(per iteration)|O(L)|O(L^2)
Autoregressive Inference(per step)|O(T)|O(L)
Memory Requirements|O(C)|O(L)
Where: L stands for the sequence length. T denotes a fixed constant that accounts for compression and selection time in Mamba's autoregressive inference. C reflects the fixed size of the SSM (State Space Model) latent state in Mamba
Per: https://github.com/state-spaces/mamba/issues/196