Digital RiskWait, It Gets Worse: Early Lessons from the CrowdStrike Crisis

Whether your organization was directly affected or observed the CrowdStrike fallout from the sidelines, the incident is a stark reminder on the importance being ready for the unthinkable.

Here are 7 early lessons from the outage and recovery:

 

1. Prove it

Ensure that your response strategies include thorough testing with third-party vendors and service providers. This collaboration is critical for understanding their capabilities and limitations during a crisis. iluminr’s Microsimulations can facilitate joint exercises in a way that can be easily integrated into existing touchpoints, such as quarterly business reviews, helping to identify gaps and improving coordination between your organization and external partners, ensuring a more cohesive and effective response.

2. Build an Unstoppable Team

The recovery efforts from the CrowdStrike incident required extensive human intervention, highlighting the need for adequate staffing. Many organizations found themselves dependent on managed service providers (MSPs) who struggled to meet the simultaneous demands of multiple clients. This underscores the importance of having sufficient internal IT staff capable of responding to such crises. Unsurprisingly, those that invest in risk and response activities generally fared better in the event. This was particularly true of highly regulated industries with evolved and well-exercised capability.

Testing scenarios under varied assumptions, such as different type of system outages, time of day, critical staff availability (think paid time off and remote holidays), third party constraints, and duration of the outages were routinely called out as important aspects to consider in developing response capability.

iluminr Microsimulations can help prepare your team for similar scenarios by providing realistic, hands-on training that builds the necessary skills for effective incident response.

3. Ask a Scientist and a Librarian

One of the challenges encountered during the CrowdStrike recovery was navigating complex reboot processes. Regularly reviewing, documenting, and rehearsing recovery steps is essential to ensure that your team can quickly and efficiently handle such situations. iluminr’s response Playbooks offer detailed, step-by-step guides tailored to your organization’s specific needs, enabling your team to act swiftly and confidently in the face of technical hurdles.

4. A Little Foresight Saves the Day

Certain IT processes can be difficult to replicate manually during an outage, necessitating flexible and robust contingency strategies. Some of the most impacted organizations were the ones that didn’t have alternate systems, paper-based or manual procedures in place before the outage occurred. iluminr’s Microsimulations can help test these contingency strategies as if the event were unfolding, ensuring they are practical and effective under real-world conditions. This preparation helps to mitigate risks and maintain continuity during unexpected disruptions.

5. All In This Together

IT challenges often extend beyond the technical team, impacting front-line customer-facing staff who need to handle inquiries and manage customer expectations during incidents. One airline pilot en route to the US over the Pacific shared the situation candidly with passengers and crew,

“Folks we have no idea what we’re walking into on the ground.”

It is vital to train these teams to not only understand the IT challenges and the correct procedures to follow, but where to go for more information, and what messaging to use.  iluminr’s Microsimulations include scenarios that specifically address the role of front-line teams, equipping them with the knowledge and confidence to confidently manage customer interactions during disruption.

6. Say what?

Clear and timely communication is critical during any IT incident. The initial confusion and rapid identification of the problem through social media during the CrowdStrike event demonstrated the need for robust communication strategies. One response executive described the back-to-basics, old-is-new challenge,

“We found the best way to get information in this scenario was from Reddit. This is highly unusual. Something we might have done 15 years ago.”

iluminr’s mass communication tools facilitate the rapid dissemination of information, ensuring that all relevant parties are informed and coordinated. This helps in minimizing downtime and optimizing the response efforts.

7. Make it Stick

Post-incident reviews are vital for continuous improvement. Instead of replacing technology, consider altering how software is deployed and controlled. Illuminating, sharing, and recording mistakes helps those learnings become part of the organizational DNA. Why stop at your own mistakes? Some of the most poignant war stories from this crisis repeated recent history. Nvidia CEO Jensen Huang described this process earlier this year in an interview at Stripe Sessions, a global internet economy conference in San Francisco:

“Feedback is learning. For what reason are you the only person who should learn this? For me to reason through it, in front of everybody, helps everybody learn how to be sensible. The problem I have with one-on-ones and taking feedback aside is you deprive a whole bunch of people of that same learning. Learning from mistakes—other people’s mistakes—is the best way to learn. Why learn from your own mistakes?”

iluminr’s Microsimulations, Learning Loops, and flexible response Playbooks support these efforts by providing a framework for continuous learning and adaptation, helping organizations to refine their processes and enhance overall security.

Where do we go from here?

The CrowdStrike incident serves as a stark reminder of the complexities involved in IT recovery and the importance of preparedness. With the right exercises, response strategies, and communication mechanisms, organizations can build a more resilient and responsive infrastructure.

What takeaways do you have from the CrowdStrike crisis? Participate in our brief survey. Your anonymous input will contribute to a comprehensive new report on capability. Join the conversation and help shape the future of digital resilience.

 

Author

Paula Fontana

VP, Global Marketing

iluminr

Microsimulations recognized in Gartner Hype Cycle for Legal, Risk, Compliance and Audit Technologies, 2024 Read more
+