Learn from Slack
The Day Slack Went Silent:
Mastering Cloud Scaling After the 2021 Outage
On the first workday of 2021, Slack suffered a major outage, impacting millions. Incident Drill helps your team practice responding to similar cloud scaling challenges and avoid costly downtime.
WHY TEAMS PRACTICE THIS
Unlock Peak Performance Under Pressure
- ✓ Reduce Mean Time to Resolution (MTTR)
- ✓ Improve System Reliability and Uptime
- ✓ Enhance Team Communication and Collaboration
- ✓ Proactively Identify Infrastructure Weaknesses
- ✓ Build Confidence in Your Incident Response
- ✓ Minimize the Impact of Future Outages
How It Works
1
Step 1: Simulate the Surge
Recreate the initial traffic spike that triggered the Slack outage.
2
Step 2: Identify the Bottleneck
Pinpoint the AWS Transit Gateway as the point of failure.
3
Step 3: Implement Scaling Solutions
Test different scaling strategies to handle the increased load.
4
Step 4: Validate and Monitor
Ensure your solutions are effective and proactively monitor for future issues.
EXPLORE MORE
Related Incidents
Ready to Prevent Your Own Outage?
Join the Incident Drill waitlist and be the first to access simulations based on real-world incidents like the Slack New Year Outage. Prepare your team for anything.
Get Early Access →
✓ Founding client discounts
✓ Shape the roadmap
✓ Direct founder support