A senior CrowdStrike executive has apologised in front of a United States government committee for the 19 July outage that caused IT systems around the world to crash and display the feared blue-screen-of-death after the company pushed a faulty update live.

The incident, which took place in the early morning in the UK, began when CrowdStrike issued an update to its Falcon threat detection platform but due to a bug in its automated content validator tool, the template containing “problematic” content data was cleared for deployment.

This in turn led to an out-of-bound memory condition which caused Windows computers receiving the update to enter a boot loop. This means affected devices restarted without warning during the startup process leaving them unable to finish a complete boot cycle.

The resulting chaos crippled 8.5 million computers for a brief period of time and affected organisations across the globe, with the impacts particularly keenly felt in the transport and aviation sectors.

In opening remarks before the House Committee on Homeland Security in Washington DC, Adam Meyers, CrowdStrike senior vice president for counter adversary operations, said that the organisation let its customers down when it pushed the faulty update.

“On behalf of everyone at CrowdStrike, I want to apologise. We are deeply sorry this happened and are determined to prevent it from happening again,” said Meyers.

“We appreciate the incredible round-the-clock efforts of our customers and partners who, working alongside our teams, mobilised immediately to restore systems and bring many back online within hours. I can assure you that we continue to approach this with a great sense of urgency.”

He continued: “More broadly, I want to underscore that this was not a cyber attack from foreign threat actors. The incident was caused by a CrowdStrike rapid response content update. We have taken steps to help ensure that this issue cannot recur, and we are pleased to report that, as of 29 July, approximately 99% of Windows sensors were back online.

“Since this happened, we have endeavoured to be transparent and committed to learning from what took place,” said Meyers. “We have undertaken a full review of our systems and begun implementing plans to bolster our content update procedures so that we emerge from this experience as a stronger company. I can assure you that we will take the lessons learned from this incident and use them to inform our work as we improve for the future.”

Andrew Garbarino, member and chair of the Subcommittee on Cyber Security and Infrastructure Protection, said: “The sheer scale of this error was alarming. If a routine update could cause this level of disruption, just imagine what a skilled, determined, nation state actor could do.

“We cannot lose sight of how this incident factors into the broader threat environment,” he said. “Without question, our adversaries have assessed our response, recovery and true level of resilience.

“However, our enemies are not just nation states with advanced cyber capabilities – they include a range of malicious cyber actors who often thrive in the uncertainty and confusion that arise[s] during large-scale IT outages,” said Garbarino.

“CISA [the US Cybersecurity and Infrastructure Security Agency] issued a public statement noting that it had observed threat actors taking advantage of this incident for phishing and other malicious activity. It is clear that this outage created an advantageous environment ripe for exploitation by malicious cyber actors.”

Disruptions caused

Committee chair Mark Green highlighted the disruption to flights, emergency services and medical procedures, not just in the US but around the world. “A global IT outage that impacts every sector of the economy is a catastrophe that we would expect to see in a movie,” he said. “It’s something that we would expect to be carefully executed by malicious and sophisticated nation-state actors.

“To add insult to injury, the largest IT outage in history was due to a mistake,” said Green. “In this case, CrowdStrike’s content validator used for its Falcon sensor did not catch a bug in a channel file. It also appears that the update may not have been appropriately tested before being pushed out to the most sensitive part of a computer’s operating system. Mistakes happen, however we cannot allow a mistake of this magnitude to happen again.”

During his testimony, Meyers also set out details of the precise nature of the problem, and outlined the steps CrowdStrike has taken to ensure it cannot happen again, although he revealed little information that has not already been made public.

He faced close to an hour and a half of questions from US politicians, including a grilling on what support CrowdStrike provided to operators of critical national infrastructure (CNI) affected by the outage, and its own observation of the exploitation of the downtime by cyber criminals.

Kernel access

Importantly, Meyers defended the need for CrowdStrike to have access to the Microsoft kernel, a core part of the Microsoft Windows operating system, which manages various resources and processes on the system and often hosts critical cyber security applications, including the Falcon endpoint detection and response sensor.

In the wake of the incident, some have claimed that for Microsoft to permit such access is dangerous, and a better practice would be to deploy such updates directly to users.

“CrowdStrike is one of the many vendors out there that uses the Windows kernel architecture – which is an open kernel architecture, a decision that was made by Microsoft to enable the operating system to support a vast array of different types of hardware and different systems,” said Meyers.

“The kernel is responsible for the key areas where you can ensure performance, where you can have visibility into everything happening on that operating system, where you can provide enforcement – in other words, threat prevention – and to ensure anti-tampering, which is a key concern from a cyber security perspective,” he said. “Anti-tampering is very concerning because when a threat actor gains access to a system, they would seek to disable security tools, and in order to identify that that’s happening, kernel visibility is required.

“The kernel driver is a key component of every security product that I can think of,” added Meyers. “Whether they do most of their work in the kernel or not varies from vendor to vendor, but to try to secure the operating system without kernel access would be very difficult.”



Source link