First of all, thank you for chiming in, here, on HN. Your presence might also be welcomed on the google forum, since I've read most of the posts there and no one has managed to answer this question.
If memory pressure is the issue, how are the trusted testers finding their memory pressure when they have a whole number of in-flight requests? If the PlusFeed fellow got 2.7 working, we'd expect to see 100/3.5%= 28 in-flight requests. Do you have data on how big the base memory vs per-thread memory requirements of these apps are? Python isn't famous for freeing up memory.
Which is to say, do you have any solid numbers that tell us that when we switch to 2.7, you wont have exactly the same memory pressure and either have to up the instance-hour cost, or start charging for ram, or just limit in-flight request to 3 or 4 so that our costs are only 5 times as much instead of 20?
Bottom line: what we all woke up to is that fact that as of right now:
* you set the price of an instance, and
* you get to decide how many instances I'm going to pay you for
Some of us are thinking that while that was a great idea when engineers were in charge, its not such a great idea now the bean-counters have taken over.
That's a good question. I can't point to published figures, since the 2.7 runtime is still fairly new, but I can say that based on both my personal experience and based on fairly basic reasoning, the per-thread memory overhead is definitely a lot less than what's required by the whole instance. The entire of the Python standard library, along with your framework and other libraries, are shared overhead between all the threads.
The issue with charging by CPU hour was that you could occupy memory-seconds as much as you wanted without charge; that's no longer the case - by charging for instances, we're implicitly charging for the memory they use.
As far as determining how many instances you run - you can do this to a large degree, both by setting budget limits, and by setting scheduler parameters.
If memory pressure is the issue, how are the trusted testers finding their memory pressure when they have a whole number of in-flight requests? If the PlusFeed fellow got 2.7 working, we'd expect to see 100/3.5%= 28 in-flight requests. Do you have data on how big the base memory vs per-thread memory requirements of these apps are? Python isn't famous for freeing up memory.
Which is to say, do you have any solid numbers that tell us that when we switch to 2.7, you wont have exactly the same memory pressure and either have to up the instance-hour cost, or start charging for ram, or just limit in-flight request to 3 or 4 so that our costs are only 5 times as much instead of 20?
Bottom line: what we all woke up to is that fact that as of right now:
* you set the price of an instance, and
* you get to decide how many instances I'm going to pay you for
Some of us are thinking that while that was a great idea when engineers were in charge, its not such a great idea now the bean-counters have taken over.