I have always really wondered how airline scheduling software works. Despite being pretty good with algorithms, I just have no idea how you'd make a system that's robust to weather and mechanical delays.
The field of science (math) that studies this (and applies it) is called "Operations Research"[0] and it's about optimization & planning. About 30yrs ago they started applying it to airline scheduling, here's a few random papers I found:
One important component is slack. Every airline at every airport should have a certain number of crews and airplanes capable of providing service in place of a delayed flight. Running on maximum efficiency for airplanes and staff means unexpected delays will cause cascading failures. Weather can be forecasted, and additional crews can be routed to replace probable future cancelled flights. Temporary staff and increased hours can be utilized for peak demand seasons. We saw similar problems with manufacturing failures when the supply chain became unreliable because of a lack of slack. This type of slack can be seen as an inefficiency and costs money, so it's unsurprising to see budget airlines struggling.
Another important component is disaster recovery. How quickly can the system recover from missed flights? What is the game plan for dealing with crews/airplanes that are out of place. How will they return to normal operations? Often times having a play book everyone is working from can lead to faster recoveries than dealing with each individual crisis as it happens, often with either too much micromanagement from leadership or too little coordination between departments. The play book generates a conciseness before the system is stressed.
> Every airline at every airport should have a certain number of crews and airplanes capable of providing service in place of a delayed flight.
Airline pax are probably not willing to pay for spare standby aircraft and flight and cabin crews at every airport every airline operates from.
Southwest’s original low-cost carrier business innovation was to run an all-737 fleet and make business-wide efforts to optimize for fast ground-turns, in order to get more flights out of each aircraft.
Certainly not, that would destroy profit and competitiveness. Spare airplanes are mostly in for non-essential maintenance, and spare crews can be called up in an hour or two. That's good enough. Catastrophic outages every few years still cost less than building decent redundancy into all operations.
This is partly because airlines are still externalizing a good portion of the cost onto their customers, who need to rebook at short-term pricing. I'd love to see legislation to address this loophole.
Unpopular opinion but airline tickets are way too cheap for what they are doing. My last trip to Vegas, the Uber ride to the airport was more expensive than the airline ticket. My Uber money went toward 1. The driver’s labor, 2. The car and gas, and 3. Uber’s (mostly engineering) overhead. That’s it. And it was like $150! My airline ticket pays for pilots with decades of training, dozens of trained professionals and support agents, baggage handling, security, airport operations, sometimes meal service and entertainment, not to mention the wizardry of launching me 30kft into the air so I can get to another state in an hour. All that for $99?
For comparison, a Greyhound bus on the same route I was about to take a Southwest flight on was about $160 and took 36 hours with 2 transfers compared to $250 on Southwest with one transfer and 5 hours total travel time.
I don't pay for extra standby aircraft, but additional flight availability is why I pick one airline over another. If you choose to fly Spirit, and your flight is canceled or delayed for any reason, you might not make it to your destination for days. With a major carrier, you'll simply be rebooked on the next flight.
Southwest used to be a budget-friendly airline with decent service. Now they're priced as much or more than the other major carriers with the added friction of having to book search flights only on their site.
> One important component is slack. Every airline at every airport should have a certain number of crews and airplanes capable of providing service in place of a delayed flight.
Airlines have crew on "reserve" at all times near bases to handle this problem. They are being paid to sit around and not actually work unless called in. Pilots love to try to get on the reserve list for obvious reasons.
I don't know how Southwest handles reserve, since they don't have "bases" like other airlines do.
> Every airline at every airport should have a certain number of crews and airplanes capable of providing service in place of a delayed flight.
Good luck finding pilots to be “on-call” to fly anywhere in the world (and most commonly to small US cities) on a moments notice, with a jump seat return flight as their way home (after a night in a small city hotel).
You'll find that almost all airlines keep staff on call in various places and with various reporting times, because already at very small scale, you'll have some crew not making it to work for whatever reason all the time.
Maintaining right sized and right placed operational buffers is an entire sub-category of within airline scheduling software/consultancy.
Those buffers will never cover a major disaster of course. They should let you hit your on-time and cancellation targets at smallest possible cost, though.
I worked on the medical resident scheduling problem for a while, and there is a giant body of work on all kinds of staff scheduling problems going back to the 1960s at least.
The two classes of solutions that I considered where optimization solvers (see Gurobi Optimization for example), and meta-heuristics (see the book Metahueristics: From Design to Implementation). If I remember correctly, the people at Gurobi started at a previous company which was spun out of an airline, but I might be confused. All the algorithms in both classes of solutions are so nuanced that it can take years to begin to grasp how their strengths and weaknesses interact with your particular scheduling challenge, and how the way you formulate the problem interacts with the ability of the algorithm to solve it.
All that said, the real problem for me was a human one: If you produce a viable schedule X, the organization involved will always want to alter the rules to stretch the available resources to cover more, and simultaneously all the schedule staff will want more flexibility and nuance in expressing their preferences. You, as the author of scheduling software, are caught between them. Neither side is ever happy with the result.
I occasionally daydream about revisiting resident scheduling (I don't recommend it, the people who use your software leave every year, are not business oriented, and don't understand the complexity of the task until they've tried it on their own their first and only attempt). If I did, I would focus less on algorithms, and more on incentives to reconcile the tension between the organization, which wants to cover the most shifts with the fewest people at the cost of flexibility and preferences, and the staff, who want more flexibility and more preferences satisfied. I think that is the core problem at a business level.
The easiest way to solve the tension is probably to add additional money into the mix - the hard to fill shifts get paid a bonus, etc. someone would figure out how to game it of course but you already have this somewhat when overnights pay more.
Sort of. They bid based on seniority. So if you've been there forever you get the cushy flight that pays a ton. If you just joined to get the worst shift nobody wants (because it's the last one left).
I don't have experience with airline scheduling but I have experience with software in large financial corporations like banks. The situation with banks (at least the ones I worked with) is that there is a huge amount of software maintained mostly by mediocre to bad teams. These teams fail a lot, the software fails a lot, and yet everything seems to keep going.
It is not about code or algorithm quality, it is about procedural side of things -- how the organisation is "programmed" to respond to failures. I use the word "programmed" in a very broad sense -- for me setting up a paper checklist and being able to rely on people to follow it is the same as programming.
I suspect the main difference between airlines and banks is that banks can afford to throw money on the problem and just get things done regardless of how inefficiently.
Airlines are in the much worse position -- they were able to afford being inefficient and throwing money at the problem in the past but can't do it anymore. They work with old, outdated software that wasn't built with efficiency in mind but now don't have funds to change it and are forced to maintain what they have. This may also be the answer to why sometimes they just don't have capacity to react to problem and let it cascade to bring everything to a halt.
Instead of framing this as saying they "don't have" the funds and were "forced" into inaction, another framing that the board of directors must consider is that current SWA leadership failed in their responsibility to recognize, prioritize, and manage the actual needs of their decades old business.
I know a bit about this with airlines - there's a lot of thinking that goes in to making the system appropriately robust but just as much, if not more investment, in optimizing recovery for minimal damage as well. They have optimizers that figure out the minimal damage from a plane being taken out of schedule, and airport halting flights due to weather, or whatever the issue may be. What's cool is the math that goes into "minimal damage" with regards to passengers, crews, bags, etc.
You can poke around at the website for SlickOR (https://www.slickor.com/) to get an idea of the surface level work that goes into this.