American airline Delta is caught in a blame game with Microsoft and CrowdStrike as it threatens to seek damages for costs following last month’s unprecedented IT outage.
In late July, cyber security vendor CrowdStrike made global headlines for releasing a botched update which bricked Windows devices across the planet.
The outage – dubbed the largest in IT history – disrupted hospitals, shut down offices and caused major flight cancellations both domestically and abroad.
In the US, the most affected airline carrier was arguably Delta Airlines – which is reportedly dealing with over 176,000 refund or reimbursement requests after cancelling nearly 7,000 flights in the wake of the outage.
Delta chief executive Ed Bastian told CNBC’s ‘Squawk Box’ these cancellations cost the company about $762 million (US$500 million) – including customer compensation – and the airline had “no choice” but to seek damages.
He went on to describe a gruelling recovery process which saw IT staff scramble to manually reset some 40,000 servers “by touch”, before signalling the company would rethink its reliance on Microsoft and CrowdStrike.
“It was terrible,” said Bastian.
“Our apologies again to our customers, our people…. it was just a really, really tough situation.
“You’ve gotta test this stuff – you can’t come into a mission critical 24/7 operation and tell us ‘we have a bug’, it doesn’t work.”
Last week, Bastian confirmed to staff the company had notified CrowdStrike and Microsoft it was “planning to pursue legal claims” for losses stemming from the outage and had hired law firm Boies Schiller Flexner.
The airline has unabashedly laid the blame on CrowdStrike in its customer communications too, going so far as to offer refunds in a post titled, ‘What Delta is doing to make things right for customers impacted by CrowdStrike disruption’.
CrowdStrike: Delta turned down help
In a letter published over the weekend, CrowdStrike lawyer Michael Carlinsky signalled his client won’t take the airline’s accusations lying down.
He wrote CrowdStrike was “highly disappointed” by Delta’s suggestion that it acted inappropriately, and “strongly rejects any allegation” it was grossly negligent or committed wilful misconduct with respect to the incident.
“CrowdStrike continues to work closely and professionally with the Delta information security team,” Carlinsky wrote to Delta lawyer David Boies.
“Delta’s public threat of litigation distracts from this work and has contributed to a misleading narrative that CrowdStrike is responsible for Delta’s IT decisions and response to the outage.”
Carlinsky further noted Delta turned down an offer for onsite support, and CrowdStrike chief executive George Kurtz directly reached out to his Delta counterpart Bastian only to get no response.
He finalised the letter by stating that “should Delta pursue this path”, it would need to explain why its competitors all restored operations “much faster”, why the company “turned down free onsite help”, and that any liability by CrowdStrike is contractually capped “at an amount in the single-digit millions”.
Meanwhile, a separate letter from Microsoft lawyer Mark Cheffo deemed Delta’s public comments “incomplete, false, misleading and damaging to Microsoft and its reputation”.
According to Cheffo, Delta also turned down free assistance from Microsoft (which the tech giant offered despite its software not having caused the outage) and Microsoft chief executive Satya Nadella similarly emailed Bastian to no response.
The most damning remark from Cheffo was that Delta likely refused Microsoft’s help because its crew-tracking and scheduling system was serviced by other providers, such as IBM, rather than Microsoft Windows or Azure.
CrowdStrike’s full root cause analysis
Earlier this week, CrowdStrike followed up its initial explanation of the outage with a full root cause analysis (RCA) – revealing a mixed bag of testing processes and technical blunders.
CrowdStrike’s outage began 19 July when a routine update to its cyber security software, Falcon, contained a bug which affected an estimated 8.5 million Windows devices.
While receiving some new data intended to help prevent security threats such as malware – Falcon was handed 21 input fields when it expected to get 20.
This resulted in a “parameter count mismatch” – or simply put, an error – which caused affected systems to crash.
In a somewhat obtuse breakdown, CrowdStrike debunked suspicions that it failed to test the botched update before it was rolled out – though its testing processes appeared inadequate at picking up the underlying error.
“This parameter count mismatch evaded multiple layers of build validation and testing, as it was not discovered during the sensor release testing process, the Template Type (using a test Template Instance) stress testing or the first several successful deployments of IPC Template Instances in the field [sic],” CrowdStrike explained.
This explanation was met with mixed reception.
Jon Robertson, founder of Australian cyber security company Tarian Cyber, summarised CrowdStrike’s checks and validations “didn’t work as expected”, errors went undetected and staged rollouts of the update “didn’t exist”.
“Brutal findings, but refreshingly transparent, which is how it should be,” said Robertson.
Patrick Gray of the Risky Business podcast found the RCA lacked detail and was “pretty awfully written”, while information security professional ‘munin’ picked apart much of its jargon in a scathing thread on social networking platform Mastodon.
“This is turd-polishing,” wrote munin.
“They allowed wildcards in the 21st field initially, and then disallowed them but didn't test that change.”
CrowdStrike apologised “unreservedly” for the incident”, while Kurtz released a followup to his initial statements, stating the company is using the “lessons learned from this incident” to better serve its customers.
“We have already taken decisive steps to help prevent this situation from repeating, and to help ensure that we – and you – become even more resilient,” said Kurtz.
CrowdStrike is also facing a class action lawsuit which claims it misled shareholders and did not properly test Falcon updates before rolling them out.