Per instance (worker serving an API request) it requires 8x GPUs. I believe they...

thewataccount · on March 1, 2023

Ah okay, that makes a lot more sense thank you!

pharmakom · on March 2, 2023

I expect some level of caching and even request bucketing by similarity is possible.

How many users come with the same prompt?

thewataccount · on March 2, 2023

In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...