Usually it requires fewer epochs to train the GRU than the LSTM.
The learning capacity of the LSTM and GRU is proportional to the number of weights, so you need more GRUs to do the job of a given number of LSTMs.
It drives me nuts to see so many tutorials where people do the LSTM because they saw other people do the LSTM when doing the same with the GRU is simpler and better!
The reason is there only two gates for Gru, they don't have an internal state as that of LSTM, since having few parameters compared to LSTM it takes less time to train