I agree that in this case the animated parts of the graphics were not needed, it...

ypcx · on July 28, 2020

Yes this seems like an early work in progress, compared to Jay's previous Transformer articles.

In addition to your link, I've found a really good Transformer explanation here (backed by a Github repo w/ lively Issues talk): http://www.peterbloem.nl/blog/transformers

Additionally, there's a paper on visualizing self-attention: https://arxiv.org/pdf/1904.02679.pdf

ypcx · on July 28, 2020

Can't edit the post anymore so adding it here - further research reading on improving the current attention model: https://www.reddit.com/r/MachineLearning/comments/hxvts0/d_b...

m3at · on July 28, 2020

That's a good complement, thank you for the links