Learn from Datadog
When a Simple Upgrade
Wiped Out Datadog's Kubernetes Network
In 2022, Datadog experienced a significant network outage due to a misconfigured Cilium upgrade. Incident Drill helps your team prepare for similar high-stakes scenarios through realistic incident simulations and collaborative learning.
WHY TEAMS PRACTICE THIS
Master Kubernetes Incident Response
- ✓ Reduce downtime and MTTR
- ✓ Improve team communication and collaboration
- ✓ Enhance incident response skills
- ✓ Identify vulnerabilities in your infrastructure
- ✓ Build confidence in handling critical incidents
- ✓ Minimize the impact of future outages
How It Works
1
Step 1: Simulate
Run a realistic simulation of the Datadog Cilium outage.
2
Step 2: Investigate
Diagnose the root cause and identify the misconfiguration.
3
Step 3: Collaborate
Work with your team to develop a mitigation strategy.
4
Step 4: Resolve
Implement the fix and restore network connectivity.
EXPLORE MORE
Related Incidents
Ready to Master Incident Response?
Join the Incident Drill waitlist and be among the first to experience realistic incident simulations and collaborative learning. Prepare your team for anything!
Get Early Access →
✓ Founding client discounts
✓ Shape the roadmap
✓ Direct founder support