Guide on-call engineers from alert detection through severity triage to resolution and review.
Free to start · Fully editable · Export to SVG, PNG, GIF & MP4
8 connected components you can rename, recolor, and extend with AI.
An incident response decision tree branches an on-call engineer through the steps of handling a production incident. The root is alert detection, branching into severity triage, then into paths for declaring an incident, assigning a commander, mitigating the issue, and communicating to stakeholders, ending in resolution and a postmortem. Each branch represents a decision based on impact.
SRE teams, DevOps engineers, and incident commanders use it to standardize on-call behavior, reduce mean time to resolution, and train new responders. It anchors runbooks, on-call playbooks, and reliability training for teams adopting formal incident management.
It is a branching flowchart that guides on-call engineers through detecting, triaging, mitigating, and resolving a production incident based on its severity.
The core steps are detection, severity triage, declaring an incident, mitigation, stakeholder communication, resolution, and a postmortem review.
Severity is set by impact on users and revenue, with higher severities triggering an incident commander, broader paging, and formal communication.
A blameless postmortem captures root cause and action items so the team can prevent recurrence and improve mean time to resolution.
Visualize code flowing from commit through build, test, and automated deployment stages
Show the Kubernetes control plane and worker node components and how they connect
Map the Git-driven deployment loop where a repo is the source of truth for infrastructure
Show how metrics, logs, and alerts flow from services into dashboards and on-call paging
Map independent services linked by an API gateway, message bus, and per-service data stores
Show a typical AWS web app stack from CloudFront and ALB to compute, RDS, and S3
Map independent services, an API gateway, databases and a message bus in a microservices system
Map API Gateway, Lambda functions, managed databases and event triggers in a serverless app
Open the incident response decision tree in the Infogiph canvas, then edit, animate, and export.
Use this template