Hacker News new | past | comments | ask | show | jobs | submit login
Understanding LSTM and its diagrams (medium.com/mlreview)
125 points by adamnemecek on Nov 15, 2017 | hide | past | favorite | 10 comments



Pro Tip: The GRU is a much simpler unit (easier to explain) and usually works better than the LSTM.


Interesting paper that compares performance of GRU vs LSTM. http://proceedings.mlr.press/v37/jozefowicz15.pdf

TL;DR - For most tasks other than Language modeling GRU is better.


why is LSTM better for language modeling?


The paper doesn't speak much. Empirical results.


What do you mean by "works better"? Do you mean something beyond having fewer parameters to store/calculate?


Usually it requires fewer epochs to train the GRU than the LSTM.

The learning capacity of the LSTM and GRU is proportional to the number of weights, so you need more GRUs to do the job of a given number of LSTMs.

It drives me nuts to see so many tutorials where people do the LSTM because they saw other people do the LSTM when doing the same with the GRU is simpler and better!


The reason is there only two gates for Gru, they don't have an internal state as that of LSTM, since having few parameters compared to LSTM it takes less time to train


You mean train to the same performance plateau?



> Are reposts ok?

> If a story has had significant attention in the last year or so, we kill reposts as duplicates. If not, a small number of reposts is ok.

https://news.ycombinator.com/newsfaq.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: