Hacker News new | past | comments | ask | show | jobs | submit login

Sequence length is 2048 (page 10). Guesswork based on page 65 about compute says that single inference of 2048 tokens is about 1 petaflops, which may be not that bad if your commutinacation is fast (seconds?)



According to https://cloud.google.com/tpu, each individual TPUv3 has 420 Teraflops, and TPUv4 is supposed to double that performance, so if that guess is correct, it should take a few seconds to do inference. Quite impressive really




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: