Understanding LSTM and its diagrams

PaulHoule · on Nov 15, 2017

Pro Tip: The GRU is a much simpler unit (easier to explain) and usually works better than the LSTM.

reachtarunhere · on Nov 16, 2017

Interesting paper that compares performance of GRU vs LSTM. http://proceedings.mlr.press/v37/jozefowicz15.pdf

TL;DR - For most tasks other than Language modeling GRU is better.

cdancette · on Nov 16, 2017

why is LSTM better for language modeling?

reachtarunhere · on Nov 19, 2017

The paper doesn't speak much. Empirical results.

nonbel · on Nov 15, 2017

What do you mean by "works better"? Do you mean something beyond having fewer parameters to store/calculate?

PaulHoule · on Nov 16, 2017

Usually it requires fewer epochs to train the GRU than the LSTM.

The learning capacity of the LSTM and GRU is proportional to the number of weights, so you need more GRUs to do the job of a given number of LSTMs.

It drives me nuts to see so many tutorials where people do the LSTM because they saw other people do the LSTM when doing the same with the GRU is simpler and better!

yolorn123 · on Nov 16, 2017

The reason is there only two gates for Gru, they don't have an internal state as that of LSTM, since having few parameters compared to LSTM it takes less time to train

nonbel · on Nov 16, 2017

You mean train to the same performance plateau?

nafizh · on Nov 15, 2017

[dupe] https://news.ycombinator.com/item?id=13612938

https://news.ycombinator.com/item?id=12960989

sctb · on Nov 16, 2017

> Are reposts ok?

> If a story has had significant attention in the last year or so, we kill reposts as duplicates. If not, a small number of reposts is ok.

https://news.ycombinator.com/newsfaq.html