Incident management for high-velocity teams
Creating postmortem reports
Why collecting and documenting data is key to the incident postmortem process
An incident postmortem can be divided into two distinct artifacts: the meeting where the incident is discussed, and the corresponding postmortem report created as an output of that meeting.
These two activities, the meeting and the report, are often used interchangeably when people refer to a “postmortem”. People might be talking about either, or both, when they use the term.
Looking to get started with a postmortem template? Check out our postmortem templates.
But there is a difference between the postmortem meeting and the written postmortem report.
At Atlassian, we typically use postmortem, or incident postmortem, to describe the entire process of analyzing an incident, including:
- Running an incident postmortem meeting
- Capturing actions and information during the meeting
- Getting approval on follow-up actions and communicating the outcome of the meeting
Read more about how Atlassian manages postmortems in our incident management handbook.
What makes for a good incident postmortem report?
Clear and consistent prompts
A good report should be based on a clear and consistent framework. Effective teams set up every postmortem on a template, where participants answer a set of questions or prompts.
This ensures key details aren’t forgotten. It also builds consistency across incidents, and helps the team identify patterns, trends, and opportunities for improvement. The framework can be iterated and improved on over time, but any changes should be intentional.
Rich details and data
Postmortem fields aren’t places to skimp on details and gloss over events. This is where you want to get very granular and specific. Don’t say you saw a traffic spike, say precisely by how much and what metric changed. Don’t say the team was confused, pull in an exact quote from the chat history where someone expressed confusion.
Inclusive, blameless language
Like many teams, we practice blameless postmortems here at Atlassian. It’s important to keep finger-pointing out of the meeting and the analysis of the incident. But be sure the same care is taken with the words written on the report. Avoid language that dishes out blame or singles people out.
Important questions to ask during a postmortem report
Here are the prompts included in Opsgenie’s postmortem feature:
- Leadup
Describe the circumstances that led to this incident
- Fault
Describe what failed to work as expected
- Detection
Describe how the incident was detected
- Root causes
Run a 5-whys analysis to understand the true causes of the incident
- Mitigation and resolution
What steps did you take to resolve this incident?
- Lessons learnt
What went well? What could have gone better? What else did you learn?
Check out our article on postmortem templates for more example questions to include on a postmortem report.
What else to include on a postmortem report
- Screenshots
Attach relevant screenshots, especially ones the response team took during the outage. What did you see change in the product? What product behavior didn’t happen as expected?
- Tickets
Link to any relevant tickets related to the incident.
- Customer feedback
Did any customer feedback come in about the incident? These could be reported in places like a help desk, over email, on social media. Don’t worry about including all of it.
- Charts and grafs
What data visualizations help show the impact of the incident?
- Data
Any other key data points about the incident or its impact?
- Chat exchanges
If the team uses a chat tool like Slack during the response effort, consider including any key messages or exchanges from the chat history.
- Timelines
A clear timeline of the incident is an excellent aid for incident analysis. What were the key events and their timestamps during the incident.
Internal vs. external postmortem reports
While it’s less common, some organizations choose to publish a public version of a post-mortem after an incident. This is especially common for large scale consumer services who have outages that affect a lot of users. They might be publishing the full postmortem report, or (more likely) they’re publishing a trimmed-down version of the internal report. It’s likely necessary to clean up some sensitive or private information.
How the pros respond to major incidents
Get our free incident management handbook. Learn all the tools and techniques that Atlassian uses to manage major incidents.
Learn incident communication with Statuspage
In this tutorial, we’ll show you how to use incident templates to communicate effectively during outages. Adaptable to many types of service interruption.
Read this tutorialThe importance of an incident postmortem process
An incident postmortem, also known as a post-incident review, is the best way to work through what happened during an incident and capture lessons learned.
Read this article