Ok, what FlashAttention changes is space complexity: from O(N^2) to O(N). Time c...

		DrNosferatu 5 days ago \| parent \| context \| favorite \| on: The FFT Strikes Back: An Efficient Alternative to ... Ok, what FlashAttention changes is space complexity: from O(N^2) to O(N). Time complexity is still ~O(N^2) as with standard Self-Attention. In other words, optimizes practical runtime through I/O reduction without altering asymptotic complexity