Hacker News new | past | comments | ask | show | jobs | submit login

Ok, what FlashAttention changes is space complexity: from O(N^2) to O(N). Time complexity is still ~O(N^2) as with standard Self-Attention.

In other words, optimizes practical runtime through I/O reduction without altering asymptotic complexity






Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: