FlashAttention: Fast Transformer training with long sequences

thawab · 2023-10-01T12:44:12

The same author Tri Dao, released FlashAttention 2 in July.

https://together.ai/blog/tri-dao-flash-attention

kken · 2023-10-01T12:55:04

Here is a recent interview with the author of FlashAttention, Tri Dao:

https://www.youtube.com/watch?v=J4-qZ6KBalk

sebzim4500 · 2023-10-01T11:47:17

It's insane that FlashAttention was released 16 months ago. It feels like a decade.

1024core · 2023-10-01T17:25:46

Has anybody used FlashAttention in their model? Are there any benchmark numbers on the quality impact?

pama · 2023-10-01T17:28:05

The result is identical to regular attention in transformers but training can be about four times faster, so there is almost no reason to not use it.

NLPaep · 2023-10-03T10:25:56

Not quite. There can be non-deterministic race conditions, and strange head size and sequence length requirements.

pama · 2023-10-04T05:11:57

Yes. For a model within the limits of the head requirements, however, you wouldn’t be able to see a quality difference from regular attention. Non determinism is a performance price; regular transformers may also suffer from it depending on the implementation.

ttul · 2023-10-01T17:17:51

It’s basically a way of more efficiently making use of memory transfers during the calculation of the attention blocks in a transformer. You transfer a block at a time, increasing inference throughout because less time is spent overall fetching things from slow memory.

fzysingularity · 2023-10-01T20:50:52

Also, isn't the author Tri Dao at Together AI now as their chief scientist?

dang · 2023-10-01T18:46:15

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - https://news.ycombinator.com/item?id=31568090 - May 2022 (3 comments)