Hacker News new | past | comments | ask | show | jobs | submit login

One can view RNNs as a sort of generalization to markov chains. RNNs have the advantage of a memory, context tracking and are not limited to learning patterns of some specific length. RNNs can apply these advantages to learn subtleties of grammars, balance parenthesis, the proper use of punctuation and other things that a markov chain might never learn (and certainly not memory efficiently). For any given piece of text, RNNs can be said to have gotten closer to understanding what was consumed.

The other question is, are those difficult to learn things truly worth the cost of training and running an RNN? If a fast and simple markov chain serves, as is likely the case in practical settings, then it is better to go with the markov chain. The RNN will still make obvious mistakes, all while correctly using subtle rules that trouble even humans. Unfortunately, this combination is exactly the kind of thing that will leave observers less than impressed: "Yes I know it rambles insensibly but look, it uses punctuation far better than your average forum dweller!" Alas, anyone who has gone through the trouble of making a gourad shaded triangle spin in Mode X and proudly showing their childhood friends, can explain just what sort of reaction to expect.

Eh, so, the moral here is pay attention to cost effectiveness and don't make things any more complicated than they need to be.

Yoav Goldberg treats much the same thing as this blog post but with far more detail and attention to subtlety here: http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139




Viewing RNNs as a generalisation of Markov chains ia a bit confusing, because what you're calling a Markov chain isn't really a Markov chain in its most general form.

The one characteristic a Markov chain must have is that the transition probabilities are completely determined by its current state. This property is true for both RNNs and what you call Markov chains. The main difference is that the state space for RNNs is a lot bigger and better at describing the current context (it was designed to be).


Formally, there is no limit to the number of states in a Markov chain.

So in this sense, actually a RNN is a kind of Hidden Markov Chain - one with more structures added to it. The structure might an RNN better than an HMM but it doesn't make it more general, it makes it more specific.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: