Incident management for high-velocity teams
Bringing order to chaos: The role of the incident commander
What is an incident commander (IC) and why do you need one?
It’s no secret that major incidents have a big impact on a company’s bottom line. Which is why incident management is an essential and ever-evolving part of any ITSM practice. But when an incident strikes, who’s responsible for getting systems back up and running?
The general answer is usually IT or DevOps. But no matter which department or departments handle major incidents, the person at the helm of resolution is typically your incident commander.
What is an incident commander?
An incident commander—also known as an incident manager—is a member of the IT or DevOps team who is responsible for managing incident response. This person’s priority is to guide an incident to its resolution as quickly and completely as possible, managing the resources, plan, and communication involved in that resolution.
The term is also used by firefighters and US emergency response teams and, while the stakes are often higher in those scenarios, the role remains the same. The incident commander is always the go-to person with the final say on all things related to the incident.
Why do teams need an incident commander?
Your incident commander is the primary point of contact and source of truth about your incident. They see the big picture, manage all the moving pieces, know what’s been tried and what’s still on the radar, and plan for and manage next steps.
Without an incident commander, communication and teamwork break down. It’s easy for teams to do duplicate work without knowing it, miss big-picture concerns, and fail to communicate quickly and accurately with system users, internal stakeholders, leadership, and each other. The larger and more complex an organization’s technology or team structures are, the more essential this role is to a healthy incident management practice.
The duties of an incident commander
Incident preparation
Incident commanders are responsible for setting up communication channels, inviting the appropriate people into those channels during an incident, and training team members on best practices for not only incident management, but also communication during an incident.
Decision-making
ICs are responsible for quickly assessing an incident and making decisions about what to do, which team members are needed, and what actions come next at every stage of the resolution process. They should be good listeners, well-versed in gathering, synthesizing, and prioritizing expert recommendations.
The best incident commanders are confident decision-makers with strong problem-solving skills.
Delegation
ICs must delegate tasks to their teams and know when to expand the team by pulling in additional developers, communication experts, etc.
Oversight
While developers get down in the weeds to figure out what caused an incident and how to resolve it in the code, an incident commander should be looking at the big picture. What has already been tried? What worked last time? What is the next best step if the current strategy doesn’t work?
Incident commanders are responsible for overseeing the process from start to finish, asking the right questions, getting regular status reports from each team member, and prioritizing next steps.
Team alignment
The bigger an incident, the more likely you are to have multiple teams working on a resolution. An IC oversees communication and makes sure everyone is on the same page. They should also keep conversations focused and brief to minimize time to resolution.
Panic management
Incidents are high-stakes, high-stress events—and studies show that stressed out people make worse decisions. Which is why part of the incident commander’s job is to keep teams calm and focused.
The IC should be able and willing to pull highly stressed people off the incident team, talk the team down as needed, and consistently bring the focus back to the task at hand. They should also, when possible, take any additional stress burden off their teams by heading off the steady stream of questions and panic coming from internal and external stakeholders.
Escalation and resource management
When needed, incident commanders are responsible for escalating issues to more senior or specialized developers and/or bringing in additional resources to speed up resolution.
Planning
Both before and during an incident, an IC should have next steps and backup plans ready to go.
Post-mortems
Once an incident has been resolved, the incident commander is responsible for the post-mortem process, including creating documents where teams can share their thoughts, planning post-mortem meetings, and making recommendations on how to prevent or lessen the impact of future incidents.
Becoming an incident commander
The core responsibilities of an incident commander are resource management, communication, and problem-solving. Anyone with these skills—from senior leadership all the way down to interns—can make a great incident commander.
Typically, requirements for incident commanders include:
- Strong communication skills
- A high-level knowledge of incident management best practices and systems
- Problem-solving skills
- The ability to make quick, confident decisions
- Listening and synthesis skills
- Previous experience with major incidents (either as a participant or an observer)
- Leadership skills—the ability to take command in a high-stress situation
Before becoming an incident commander, most companies will have you shadow other ICs to learn the ropes. In these cases, the best practice is to quietly watch and learn and hold back any questions until the incident is resolved.
Best practices for incident commanders
Keep up with industry best practices
Since Incident commanders are responsible for guiding teams successfully through incidents, they should be well acquainted with incident response best practices and incident communication best practices. Atlassian’s Incident Management Handbook is another helpful resource.
Plan ahead
It’s also essential to have a strategic plan for incidents before they happen. The more well documented your process is pre-incident, the easier it will be for the IC and teams to follow in the more intense, higher stress environment an incident creates.
Know your teams
Understanding team dynamics and the strengths and weaknesses of people on your teams leads to better delegation and faster incident resolution.
Stay on task
Even during a major incident, team calls and Slack conversations can get off track. The IC should be ready to stop tangents in their tracks and refocus the team on the task at hand.
Sometimes all this takes is a quick verbal or written reminder. Sometimes it means pulling people off the team or bringing new people in. The best ICs are even willing to remove the CEO or their boss from a call if that person is becoming a distraction.
Keep calm
The best ICs are people who can stay cool and focused in a crisis. If this doesn’t come naturally to an IC, it’s something that can be practiced and improved.
Prioritize postmortems
Once an incident is resolved, the IC should run a blameless postmortem to identify how the team can improve incident management and overall systems in the future. The best ICs not only guide incidents calmly toward resolution. They also work to help the company learn from the incident and make improvements.
Conclusion
Evey incident commander can benefit from a strong service management solution. Jira Service Management enhances communication, centralizes alerting, and incorporates knowledge base articles.
Setting up an on-call schedule with Opsgenie
In this tutorial, you’ll learn how to set up an on-call schedule, apply override rules, configure on-call notifications, and more, all within Opsgenie.
Read this tutorialPros and cons of different approaches to on-call management
On call teams are rapidly evolving. Explore the pros and cons of different approaches to on call management.
Read this article