Learn from Amazon Web Services

The $150 Million Typo:
Mastering Chaos with Incident Drill

In 2017, a single typo during routine S3 maintenance brought down a significant portion of the internet. Incident Drill helps you simulate this and other critical scenarios, empowering your team to react effectively under pressure and prevent future disasters.

Amazon Web Services | 2017 | Outage (Cloud)

The High Cost of Unpreparedness

Cloud outages can have devastating consequences, impacting revenue, reputation, and customer trust. Traditional training often falls short in preparing engineers for the real-time pressure and complex dependencies inherent in large-scale incidents. Without practical experience, teams are vulnerable to making critical errors under stress.

PREPARE YOUR TEAM

Practice Makes Perfect: Incident Drill's Realistic Simulations

Incident Drill provides a platform for practicing incident response in a safe, controlled environment. Our simulations, inspired by real-world incidents like the Amazon S3 outage, allow your team to develop critical thinking skills, improve communication, and build confidence in their ability to handle high-pressure situations.

🔥

Realistic Scenarios

Experience simulated incidents based on real-world events.

Time-Pressured Environment

Practice making decisions under the stress of a live outage.

🗣️

Collaborative Response

Improve team communication and coordination during incidents.

🔎

Root Cause Analysis

Learn to identify and address the underlying causes of incidents.

📈

Performance Tracking

Measure team performance and identify areas for improvement.

⚙️

Customizable Drills

Tailor simulations to your specific infrastructure and needs.

WHY TEAMS PRACTICE THIS

Turn Chaos Into Competence

  • Reduce downtime and minimize financial losses
  • Improve team communication and collaboration
  • Identify vulnerabilities in your infrastructure
  • Build confidence in your incident response plan
  • Enhance your team's problem-solving skills
  • Strengthen your company's resilience to outages

S3 Outage Timeline

2017-02-28 12:35 PST Routine S3 Maintenance Begins
2017-02-28 12:36 PST Engineer Executes Command with Typo Error
2017-02-28 12:43 PST S3 Index Subsystem Begins Failing
2017-02-28 16:45 PST Full Service Restored Resolved

How It Works

1

Step 1: Select a Scenario

Choose from our library of realistic incident simulations.

2

Step 2: Assemble Your Team

Gather your engineers and assign roles for the simulation.

3

Step 3: Run the Drill

Experience the incident in a controlled environment.

4

Step 4: Analyze and Improve

Debrief the simulation and identify areas for improvement.

Ready to Level Up Your Incident Response?

Join the Incident Drill waitlist and be among the first to experience the future of incident preparedness.

Get Early Access
Founding client discounts Shape the roadmap Direct founder support

Join the Incident Drill waitlist

Drop your email and we'll reach out with private beta invites and roadmap updates.