The evidence the author offers that Markov chains are as effective as RNNs is th...

The evidence the author offers that Markov chains are as effective as RNNs is that if you're given a tiny snippet of the output of each it can be hard to tell which is which.

But the thing that (both in theory and in practice) distinguishes RNNs from simpler constructs with very finite amounts of state is precisely what happens with not-so-tiny amounts of output. RNNs can produce sizable syntactically valid chunks of languages with "nested" structure -- open and close tags in XML, parentheses in Lisp, curly braces in C, etc.; Markov chains can't. It seems reasonable to guess that RNNs will also be able to produce better-structured natural language text than Markov chains ever will.

Show us a page of fake Shakespeare from the RNN and a page from the Markov chain, and I don't think it will be so difficult to tell which is which.