There's a lot of reasons TPUs could be poorly matched for your workload, including model complexity (or lack thereof), the way your model is setup, the inputs to your models (if you're bottlenecked on IO, including host <-> TPU memory bandwidth, well, what can you do?), how you were training them (including the evaluators used).
Since your post didn't actually include any details, I did a search and immediately found an article[1] where TPUs worked better for their particular use-case. I suspect I could find many such reports (and probably some opposite reports too).
It's unfortunate they didn't work for you, perhaps you should give them another shot with a different model. I'd recommend using Cloud's examples as a starting point.
Since your post didn't actually include any details, I did a search and immediately found an article[1] where TPUs worked better for their particular use-case. I suspect I could find many such reports (and probably some opposite reports too).
It's unfortunate they didn't work for you, perhaps you should give them another shot with a different model. I'd recommend using Cloud's examples as a starting point.
[1] https://medium.com/bigdatarepublic/cost-comparison-of-deep-l...