At this point, we have stabilized service to App Engine applications. App Engine is now successfully serving at our normal daily traffic level, and we are closely monitoring the situation and working to prevent recurrence of this incident.
This morning around 7:30AM US/Pacific time, a large percentage of App Engine’s load balancing infrastructure began failing. As the system recovered, individual jobs became overloaded with backed-up traffic, resulting in cascading failures. Affected applications experienced increased latencies and error rates. Once we confirmed this cycle, we temporarily shut down all traffic and then slowly ramped it back up to avoid overloading the load balancing infrastructure as it recovered. This restored normal serving behavior for all applications.
We’ll be posting a more detailed analysis of this incident once we have fully investigated and analyzed the root cause.
Regards,
Christina Ilvento on behalf of the Google App Engine Team
At this point, we have stabilized service to App Engine applications. App Engine is now successfully serving at our normal daily traffic level, and we are closely monitoring the situation and working to prevent recurrence of this incident.
This morning around 7:30AM US/Pacific time, a large percentage of App Engine’s load balancing infrastructure began failing. As the system recovered, individual jobs became overloaded with backed-up traffic, resulting in cascading failures. Affected applications experienced increased latencies and error rates. Once we confirmed this cycle, we temporarily shut down all traffic and then slowly ramped it back up to avoid overloading the load balancing infrastructure as it recovered. This restored normal serving behavior for all applications.
We’ll be posting a more detailed analysis of this incident once we have fully investigated and analyzed the root cause.
Regards,
Christina Ilvento on behalf of the Google App Engine Team
https://groups.google.com/forum/#!topic/google-appengine-dow...