Hacker News new | past | comments | ask | show | jobs | submit login

For a while now, an answer I've seen is to start with "Attention Is All You Need", the original Transformers paper. It's still pretty good, but over the past year I've led a few working sessions on grokking transformer computational fundamentals and they've turned up some helpful later additions that simplify and clarify what's going on.

You can quickly get overwhelmed by the million good resources out there so I'll keep it to these three. If you have a strong CS background, they'll take you a long way:

(1) Transformers from Scratch: https://peterbloem.nl/blog/transformers

(2) Attention Is All You Need: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547de...

(3) Formal Algorithms for Transformers: https://arxiv.org/abs/2207.09238




YouTube channel that explains the paper in detail: https://youtu.be/iDulhoQ2pro

And subsequent follow-ups (ROME, editing transformer arch): https://youtu.be/_NMQyOu2HTo

I find the channel amazing explaining super complex topics in simple enough terms for people who have some background in AI.


Yannic Kilcher is great but this video worked better for me:

"LSTM is dead. Long live transformers!" (Leo Dirac): https://www.youtube.com/watch?v=S27pHKBEp30


I second the recommendation for Peter Bloem’s tutorial.

I’m also about to read the transformer chapter from this excellent upcoming book by Simon Prince:

udlbook https://udlbook.github.io/udlbook/


Part of the problem with self studying this stuff is that it's hard to know which resources are good, without already being at least conversant with the material already.


That problem doesn't really disappear with teachers and classes ;)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: