Learn from Facebook/Meta

When the Internet Broke: Mastering Network Resilience After Facebook's 2021 Outage
with Incident Drill

In 2021, a faulty configuration change brought Facebook, Instagram, and WhatsApp to a halt for six hours. Incident Drill helps your team practice responding to complex network incidents like this, building resilience and preventing future outages.

Facebook/Meta | 2021 | Outage (Networking)

The High Stakes of Network Complexity

Modern network infrastructure is incredibly complex. A single misconfiguration, like the one that caused Facebook's outage, can have catastrophic consequences. The cost of downtime is enormous, and the impact on reputation can be devastating. Teams need to be prepared to identify, diagnose, and resolve these issues quickly and effectively.

PREPARE YOUR TEAM

Incident Drill: Your Network Incident Training Ground

Incident Drill provides realistic incident simulations that allow your team to practice responding to network outages in a safe, controlled environment. We recreate the conditions that led to the Facebook outage, allowing your engineers to develop critical problem-solving skills, improve collaboration, and reduce response times.

🌐

Realistic Network Simulations

Experience a simulated network environment mirroring the complexity of real-world infrastructure.

🕵️‍♀️

Guided Investigation

Follow pre-built scenarios that mimic real incident timelines, guiding your team through the investigation process.

🤝

Collaborative Problem Solving

Work together with your team to diagnose and resolve the incident, fostering better communication and coordination.

⏱️

Time-Pressure Scenarios

Practice making critical decisions under pressure, simulating the urgency of a real-world outage.

📈

Performance Analysis

Receive detailed feedback on your team's performance, identifying areas for improvement.

📚

Post-Incident Review

Analyze the incident and your team's response to identify lessons learned and improve future performance.

WHY TEAMS PRACTICE THIS

Unlock Network Resilience Through Practice

  • Reduce Mean Time To Resolution (MTTR)
  • Improve Team Collaboration and Communication
  • Identify and Address Vulnerabilities in Your Infrastructure
  • Build Confidence in Your Team's Ability to Handle Incidents
  • Minimize the Impact of Future Outages
  • Enhance Overall System Reliability

Facebook Outage Timeline

11:39 AM PST Configuration Change Initiated
11:44 AM PST BGP Routes Withdrawn ERROR
11:49 AM PST DNS Servers Unreachable ERROR
5:20 PM PST Services Restored SUCCESS

How It Works

1

Step 1: Understand the Incident

Review the Facebook outage scenario and the underlying network configurations.

2

Step 2: Diagnose the Root Cause

Investigate the BGP route withdrawals and identify the faulty configuration change.

3

Step 3: Implement a Solution

Roll back the configuration change and restore network connectivity.

4

Step 4: Analyze and Learn

Conduct a post-incident review to identify lessons learned and prevent future outages.

Ready to Build a More Resilient Network?

Join the Incident Drill waitlist and be among the first to access our network incident simulations, including the Facebook outage scenario. Prepare your team for anything.

Get Early Access
Founding client discounts Shape the roadmap Direct founder support

Join the Incident Drill waitlist

Drop your email and we'll reach out with private beta invites and roadmap updates.