Yes. 100% need to scale up the number of gpu workers and scale them back down based on request queue size, which is bursty. Otherwise we could spend 5 figures/month on gpus doing nothing for half the day and then still have unacceptable waits during traffic spikes