Sequence length is 2048 (page 10). Guesswork based on page 65 about compute says that single inference of 2048 tokens is about 1 petaflops, which may be not that bad if your commutinacation is fast (seconds?)
According to https://cloud.google.com/tpu, each individual TPUv3 has 420 Teraflops, and TPUv4 is supposed to double that performance, so if that guess is correct, it should take a few seconds to do inference. Quite impressive really