Commentary, National Commentary

The Shameful Open Secret Behind Southwest’s Failure

By Zeynep Tufekci of the New York Times

Computers become increasingly capable and powerful by the year, and new hardware is often the most visible cue for technological progress. However, even with the shiniest hardware, the software that plays a critical role inside many systems is too often antiquated, and in some cases decades old.

This failing appears to be a key factor in why Southwest Airlines couldn’t return to business as usual the way other airlines did after last week’s major winter storm. More than 15,000 of its flights were canceled starting Dec. 22, including more than 2,300 this past Thursday — almost a week after the storm had passed.

It’s been an open secret within Southwest for some time, and a shameful one, that the company desperately needed to modernize its scheduling systems. Software shortcomings had contributed to smaller-scale meltdowns, and Southwest unions had repeatedly warned about it. Without more government regulation and oversight, and greater accountability, we may see more fiascos like this one, which most likely stranded hundreds of thousands of Southwest passengers — perhaps more than 1 million — over Christmas week. And not just for a single company, as the problem is widespread across many industries.

This problem — relying on older or deficient software that needs updating — is known as incurring “technical debt,” meaning there is a gap between what the software needs to be and what it is. While aging code is a common cause of technical debt in older companies — such as with airlines that started automating early — it can also be found in newer systems, because software can be written in a rapid and shoddy way rather than in a more resilient manner that makes it more dependable and easier to fix or expand. As you might expect, the former is cheaper and quicker.

It’s a bit like constructing a building. If you had the option of not adhering to strict earthquake or fire codes — i.e., if there was little or no regulation or oversight — it would almost inevitably be cheaper and quicker to skip such niceties. The building might look and feel the same to its inhabitants — as long as there was no earthquake or fire. But if there <em>were</em> an earthquake or fire, the “debt” would be paid by the endangered inhabitants of the building.

Which brings us back to Southwest. Throughout the past year, members of the flight attendants union picketed in front of various airports as part of contract negotiations. One protest sign they carried? A placard declaring “Another Victim of SWA’s Outdated Technology,” with a graphic showing a stuck software progress bar. In September, they put the same sign lamenting the company’s outdated technology on the side of a truck and drove it in circles around Love Field (Southwest’s core airport) in Dallas as well as the nearby Southwest headquarters. In March, in its open letter to the company, the union even placed updating the creaking scheduling technology above its demands for increased pay.

Likewise, in October 2021, when Southwest experienced another cancellation crisis, the president of the pilots union pointed out that the antiquated crew scheduling technology was leading to cascading disruptions. Even as then-Southwest CEO Gary Kelly objected to the pilots’ claims, saying Southwest had “wonderful technology,” he conceded that their tools could use improvement.

That improvement seems not to have occurred.

Lyn Montgomery, president of Southwest’s flight attendants union, told me that currently, when hiccups or weather events happen, the employees have to go through a burdensome, arduous process to get things sorted because Southwest hasn’t sufficiently modernized its crew scheduling systems.

For example, if members of a crew from Buffalo, New York, don’t arrive in Baltimore because their flight was canceled, the employees have to manually call in to let the company know where they are and get hotels arranged for them.

Lyn told me that employees had sent in screenshots that showed their being left on hold on the phone for three, six, seven, eight, 12 hours, and even one of 17 hours, just to let the company know their whereabouts and get hotel rooms arranged. During such waits, they could “time out” — a phrase relating to a Federal Aviation Administration safety requirement that mandates a certain amount of rest between flights. The result is that once the employees managed corporate contact, they weren’t allowed to fly — even if they were at an airport with a flight that needed them. Online forums are full of employee accounts of such misery.

Meanwhile — extending our example from above — Southwest would have to find a crew in Baltimore to replace the one that never arrived from Buffalo. But the potential candidates in Baltimore might also be on hold for hours, trying to let the company know their whereabouts.

You can see how this can easily cascade to a systemwide halt, as happened this past week.

You might be wondering how Southwest can lose track of where the crews are, and why anyone has to call in at all, since the company presumably should know exactly which flights got canceled and who flew where, based on passenger lists. Southwest did have an old system, but Montgomery says it broke down during even mild hiccups, forcing employees to have to call in.

Why can’t the crews simply notify the company of their whereabouts via an app or a website, and even get their hotel assignments that way? John Brant, vice president for product strategy at Arcos, a company that sells workforce management software to airlines and other companies, told me that that’s how it works for many other airlines. But that’s yet another layer of software that has to be written and integrated into whatever software the airline uses for scheduling personnel.

Southwest concedes that technology played a role in the fiasco, but without acknowledging past decisions contributing to why this happened now.

“Our systems were overwhelmed by the scale of the disruption,” Chris Perry, a Southwest spokesman, told me. “We had available crews and aircraft, but our technology struggled to align our resources due to the magnitude and scale of the disruptions. As a result, our crew schedulers tackled the issue manually, which is a tedious, long process that takes time and trained resources to accomplish.”

Such breakdowns resulting from technical debt are often triggered by external events like weather and can be worsened by other dynamics, such as the fact that Southwest has more “point-to-point” flights than most airlines, which use a hub-and-spoke model, where passengers are ferried to major hubs like Atlanta and Dallas from their origin and then put on planes to their final destinations. But the point-to-point flight model — which has its advantages — doesn’t fully explain how Southwest still couldn’t start flying its regular schedule until a week after the storm had passed.

So why didn’t Southwest simply update its software and systems?

Well, if you are a corporate executive whose compensation is tied to stock prices and earnings statements released every three months, there are strong incentives to address any immediate problem by essentially adding a bit of duct tape and wire to what you already have, rather than spending a large amount of money — updating software is costly and difficult — to address the root problem. Then you can cross your fingers and hope that whatever catastrophe may be in the making erupts under someone else’s future tenure. Such bets often pay off since, increasingly, the plight of a company’s customers and employees is divorced from the immediate fortunes of its current top executives.

In 2020, for instance, Kelly’s compensation was a record $9.2 million, despite the fact that the company lost more than $3 billion that year because of the pandemic, and the compensation for the median employee fell by $35,000, to about $66,000. (The company said Kelly’s compensation had been set in place before the pandemic.) In the years leading up to the pandemic, while the company’s aging scheduling technology groaned, the company spent $8.5 billion of its excess cash on purchasing its own stock — a common practice among airlines that helps increase the value of the stock, the main form of compensation for many executives. Then, when the pandemic hit, like other airlines, Southwest received billions from the government in grants and low-interest loans. Kelly, an accountant who became the CEO of Southwest in 2004, retired this year with an estimated net worth in the tens of millions of dollars, so the crisis did indeed occur under someone else’s tenure.

Ultimately, the problem is that we haven’t built a regulatory environment where companies have incentives to address technical debt, rather than passing the burden on to customers, employees or the next management.

What would proper incentives look like? It would differ by industry. For airlines, it might mean holding them responsible for the problems their miserly approach causes to the flying public. To start with, they could be forced to compensate passengers for delays or cancellations that go beyond reasonable expectations because of weather or events outside their control. (Europe has such a rule, though the implementation has hit a lot of snags.)

Companies can also be substantively fined for major failures like this one. But if the fines are too small, companies will just see them as a cost of doing business and carry on.

For example, after the 2017 Equifax breach, which exposed sensitive information from 143 million Americans because the company failed to institute a routine security update to its software, the company agreed to pay a penalty of at least $575 million to the Federal Trade Commission. That may sound like a lot, but it was just a few dollars per affected customer and a mere 15 percent of the company’s revenue in 2018, the year after the hack. I’m sure Equifax would have much preferred not to have been fined, but it was still a cost the company could endure — especially those lucky enough to inhabit the executive suites. Equifax CEO Richard Smith did resign. But despite the failure and the fine, he also collected $18 million in pension money on his way out the door.

This is why we can’t just keep turning the operation of more and more of our infrastructure to antiquated software and self-interested executives. Technical debt is real debt. It will eventually be paid by someone. And unless we take steps to hold companies and executives accountable for preventable — and foreseeable — failures, it will be the public that keeps paying.