There's a subtle difference here between the translation scenario and what you o...

There's a subtle difference here between the translation scenario and what you observed. In translation, the reversal only applies to the second sentence which will tend to present information in the same order as the first sentence (for most common language pairs).

The improvement in perplexity here points to gradient propagation issues. If it's hard for the LSTM to remember information from the first sentence until it becomes useful in the second sentence, it may be easier to put some of the useful info from the first sentence "closer" to where it will be useful in the second sentence by putting the end of the second sentence closer to the end of the first.

I suspect that reversing the first sentence and not reversing the second could have a similar effect.