Explaining RNNs without neural networks

parrt · on July 10, 2020

Vanilla recurrent neural networks (RNNs) form the basis of more sophisticated models, such as LSTMs and GRUs. There are lots of great articles, books, and videos that describe the functionality, mathematics, and behavior of RNNs so, don't worry, this isn't yet another rehash. (See below for a list of resources.) My goal is to present an explanation that avoids the neural network metaphor, stripping it down to its essence—a series of vector transformations that result in embeddings for variable-length input vectors.

turnersr · on July 11, 2020

Thank you for this. As someone who who understands linear algebra well, I feel the neural network formalism obfuscates the core meaning of these processes. It's good for describing composition, but not good for understanding the parts in-between.

Looking forward to your take on LSTMs and transforms ;).

Much thanks!

Great contribution.

luminadiffusion · on July 10, 2020

It is so easy to get bogged down in the mathematics of RNNs that new learners lose perspective of the dynamics. I love that you have flipped that around.

I favor the approach of understanding the dynamics of machine learning. The mathematics are then easily distilled from the nature of the process.

This is a very clear description of that process. Thank you for it!

parrt · on July 10, 2020

Glad to be of service. Yeah, I just couldn't see what RNNs were doing through all of the neural net stuff. Much easier to think of it as just gradually transforming a vector. I also can't see linear algebra w/o visualizing different size boxes :)

leavit2me · on July 11, 2020

This is really great. I'm so glad that someone took the time to explain what is really going on. Thanks...hopefully you'll do more!

TrackerFF · on July 11, 2020

Great article, thanks.

stahurap · on July 11, 2020

hella explanation! Even I can understand