It takes a village to respond to and resolve incidents. But the teams involved in incident response often work in silos: SREs and devs are heads down fixing the problem, support is flooded with emails/tickets, and marketing/PR may be putting out fires on Twitter. Even if there’s some communication happening over chat or across desks, there’s typically room for improvement with getting these teams to work together when it matters most.
When things are on fire, the last thing on your mind is how to improve internal communication and collaboration. And even after resolution, tasks are pretty disparate. Technical teams may sit down for a post-incident review, support may look back on ticket responses, but they rarely come together to assess how they did as a complete incident response team and how they can improve the next time. That’s why it’s extra important to set aside time when things are going right to open up a dialogue between teams and improve before the next incident strikes.
We’ve discovered firsthand how helpful it is to bring technical and not-as-technical teams together before, during, after an incident response. Staying siloed means crossed wires, mixed messages, and hurt feelings. Suddenly you have two incidents – the original issue and the internal chaos that ensues. Take a look at what we learned by bringing Atlassian teams together, then check out our top tips and the workshops we created to help teams at your company unite for a better incident response, too.
Perks of uniting technical and non-technical teams during incident response
- Learn from diverse experiences and perspectives: Bringing teams together for conversations that wouldn’t otherwise occur enables them to learn from each other’s successes and mistakes. No need to reinvent the incident response wheel if folks have already established best practices based on previous experiences.
- Discover gaps in the incident response process before they impact users: Different teams often deal with different pieces of incident response. By connecting the dots before an incident strikes, you’re able to discover holes in the process before they affect your users. For example, one of our support teams discovered that an in-product error message was pointing users to a dead-end page (rather than a status page with more information), only after chatting with the SRE team that helped resolve the error that was occurring. A simple link swap meant fewer frustrated customers and fewer support tickets during the next incident.
- Communicate more clearly and build trust with customers: When teams are working in silos, customer updates are often happening in silos too. But the last thing you want to give frustrated customers are mixed/confusing updates. Teams that work together to create an incident communication plan are able to stay on-message to provide customers with consistent updates. The better the communications, the more customer trust and loyalty you will build.
- Build empathy for other teams: It’s easy to think your team has it rough when you are in the trenches day in and day out. By learning about the struggles and successes of other teams, you’ll build empathy and respect that translates into greater trust transparency across the company.
Now that we’ve covered the benefits of bringing different teams together, it’s time to unify! Use these tips and tricks to get teams working together like peas in an incident response pod.
Better together: Tips for connecting incident response teams
- Start and stay on the same page: Schedule time to sync up with people outside of your immediate team to share insights and build relationships. We suggest monthly or weekly stand-up meetings.
- Create well-defined/documented incident roles: Having clear ownership in place before an incident will help make sure things are handled efficiently by both technical and non-technical teams. It will also help quell the panic and confusion that often comes up when sh*t hits the fan.
- Invite non-technical folks to your post-mortems/post-incident reviews: Anyone who played a role in incident response (answering support tickets, tweeting to customers, resolving the incident, etc.) should be a part of your review. Having the complete picture is important when trying to learn and improve. If there were a lot of comms during the incident, consider holding a communications review in addition to a post-incident review.
- Be mindful of team-specific jargon: Try to steer clear of acronyms or other nuanced language that can be confusing or off-putting for teams new to your domain.
- Align with what matters most during incident response: It’s key to have a set of mutually agreed upon values to help guide decision making during an incident. A set of shared incident values helps maintain efficiency and consistency in more subjective situations.
Ready to give it a shot?
We’ve rolled up some of our best practices for improving incident response into Atlassian Team Playbook plays (team-building activities you can easily facilitate with your own team).
Need a bit more convincing? Here’s what a member of our support team had to say after running our incident communications play:
The incident communication play was a fantastic exercise to bridge the post-incident review gap between technical, and non-technical teams. Going through this exercise helped shine the light on improvements we can make to things like communication cadence, and making sure all teams involved are on the same page for future incidents. — Senior Support Engineer, Bitbucket
Try one (or all!) and let us know what you think by tweeting us @Statuspage.