Sprawling CrowdStrike incident mitigation showcases resilience gaps
The fact that a few lines of errant code could cause disruption on the scale that CrowdStrike’s update has over the past four days has focused unparalleled attention on the urgent need for greater resiliency and redundancy in enterprise information technology stacks worldwide.
Few expect that getting there will be easy. But almost everyone agrees that the developments of the past few days underscore the need for better preparedness, better impact mitigation, and fresh ideas for recoverability from technology failures of the sort that happened last week.
The havoc started on July 19 when a small CrowdStrike content update for the Windows version of the company’s Falcon endpoint security technology caused systems failures worldwide. Numerous airlines, banks, airports, hospital, hotels, manufacturing companies, and others reported their Windows systems as becoming essentially inoperable and refusing to restart despite attempts to reboot them out of a blue screen of death (BSOD) state. Microsoft estimated the faulty CrowdStrike update affected some 8.5 million Windows systems worldwide.
As if the recovery issues were not enough of a challenge, threat actors added to them this week by taking advantage of the chaos to try and distribute phishing emails, information stealers, and other badware. On July 22, for example, CrowdStrike warned of threat actors using a fake CrowdStrike recovery manual to distribute a hitherto unseen information stealer dubbed Daolpu. Earlier, the security vendor warned of threat actors attempting to distribute a malicious zip archive to users in South America; it purported to be a hotfix from the company, but in actuality loaded the RemCos Trojan. Others, such as KnowBe4, reported phishing attempts using the CrowdStrike issue as a lure starting just hours after news of the problem first began surfacing.
CrowdStrike: A National Security Issue?
On July 22, the US House Committee on Homeland Security demanded an explanation from CrowdStrike CEO George Kurtz on what went wrong and the measures the company will implement to prevent a similar incident in the future. In a letter to Kurtz, the committee pointed to the sheer magnitude of the disruption in the US — more than 3,000 cancelled flights, 11,800 flight delays, surgery cancellations, 911 call center outages — as reasons why the issue cannot be ignored.
“This incident must serve as a broader warning about the national security risks associated with network dependency,” Mark Green, the chairman of the committee, wrote. Malicious cyber actors backed by nation-states, such as China and Russia, are watching our response to this incident closely.”
Both CrowdStrike and Microsoft have released updates and guidance — including self-remediation tips for remote users to help organizations restore their systems. Microsoft on Monday updated its recovery tool with expanded logging, error handling capabilities, and two repair options to help organizations expedite recovery.
A Mammoth Recovery Task
Even so, the task of restoring systems will be enormous and time consuming says Thomas Mackenzie, director of product strategy at Lansweeper. “It depends on a number of factors, including, but not limited to, whether there are backups in place to roll back to, and whether the assets are virtualized or not,” he says. “Microsoft has released a tool to fix this problem, but if the asset has BitLocker and requires the key, then it can’t be used. It’s not a trivial task if you’re talking about a lot of assets across different locations.”
Danny Jenkins, CEO at ThreatLocker, says his company’s testing shows it takes about 15 minutes per computer to recover manually — something that will be required in many cases.
To read the complete article, visit Dark Reading.