> I would expect failures like this to drive the airlines towards AWS, Azure, or some similar service
I don't think that would help much. It's not really the core hardware or operating systems that tend to cause these types of outages.
More typically, it's the dependency chain between locations, applications, and services. And, there's more than one system that can cause a ground halt. The check-in service, the no-fly list functionality (which the govt runs), weight/balance, crew scheduling, dispatch functions, and so on.
Check-in is a good example. You can lose that either through a failure in the complex WAN, failures in the check-in backend service, failures with the no-fly service (run by the govt) or connectivity to it, failures in the CRS/GDS, failures in various services around check-in kiosks, failures in the online checkin, and so forth.
Once they go down, you also face an unusually high spike in request volume when you're trying to get them back up. It creates a wave than can overwhelm different parts of the system.
For the more recent failures (across different airlines) listed above, I know one was a routing storm on the IP network, one was the checkin service, and one was the central reservations system...I think a botched version upgrade. Similar effects, different root causes.
Not to say it's okay, or shouldn't be addressed, but just noting that there's not really one smoking gun.
I don't think that would help much. It's not really the core hardware or operating systems that tend to cause these types of outages.
More typically, it's the dependency chain between locations, applications, and services. And, there's more than one system that can cause a ground halt. The check-in service, the no-fly list functionality (which the govt runs), weight/balance, crew scheduling, dispatch functions, and so on.
Check-in is a good example. You can lose that either through a failure in the complex WAN, failures in the check-in backend service, failures with the no-fly service (run by the govt) or connectivity to it, failures in the CRS/GDS, failures in various services around check-in kiosks, failures in the online checkin, and so forth.
Once they go down, you also face an unusually high spike in request volume when you're trying to get them back up. It creates a wave than can overwhelm different parts of the system.
For the more recent failures (across different airlines) listed above, I know one was a routing storm on the IP network, one was the checkin service, and one was the central reservations system...I think a botched version upgrade. Similar effects, different root causes.
Not to say it's okay, or shouldn't be addressed, but just noting that there's not really one smoking gun.