Last summer, we made a bet on the importance of incident communications when we acquired Statuspage and added the first product to our suite to specifically address incident management and communications. We saw early on that providing status and regularly communicating with customers — especially during incidents — had become a critical part of the software delivery process.
In the world of cloud and DevOps, incidents are becoming more frequent, more complex to resolve, and with a greater impact, as we saw with the recent AWS outage that knocked thousands of web services offline for several hours. As the rate of incidents increases and more of the “infallible” pieces of the internet like Dyn or AWS S3 go down, businesses need to have an incident management plan in place — including people, processes, and technology — so that they can swiftly manage an incident from start to finish.
Atlassian already offers essential pieces to the incident management process — Hipchat to organize a “war room” and communicate updates, Statuspage to alert internal and external stakeholders of what’s happening, Jira Service Desk to be the incident system of record, and Jira Software to track follow on remediation actions so incidents don’t get repeated.
At Atlassian, we’ve worked with many of our customers to understand how they are adjusting to this new world and using our products in different aspects of incident management. The common consensus is that with resolution timelines shrinking and urgency increasing, legacy service desk and collaboration tools simply are not cutting it. The role of rapid and efficient collaboration and communication is absolutely critical when you’re measuring downtime in minutes and not hours or days.
Today, we’re excited to announce a new set of strategic integrations with PagerDuty to provide teams with an incident management workflow to respond, organize, and remediate when an outage or incident occurs. This launch with PagerDuty joins our existing efforts with partners xMatters and OpsGenie to bring customers best-in-breed integrations across their toolsets.
Why Incident Management?
Chances are that no matter what role you work in, you’ve noticed more things breaking than usual, and your IT or SRE teams seem to constantly be putting out fires.
There are two reasons that incidents and outages are becoming more regular: the rise of DevOps and shift to rent vs. buy infrastructure via cloud services. Companies encourage a focus on speed, and have turned to modern software practices like DevOps and use of 3rd-party cloud services to achieve this, allowing teams to iterate and innovate faster than ever before. But as the speed of deployment and reliance on outside vendors increases, it creates a surge of incidents in its wake. In fact, Statuspage customers opened and resolved nearly 200,000 incidents in 2016 alone for a total of over 1 million hours of downtime!
During downtime or an outage, efficiency is key to incident teams whose measure of success is time to resolution, so they can’t be slowed down by context-switching across multiple tools or having to re-enter information. Having a well-integrated incident management tool chain is critical, and Atlassian has invested significantly in developing best-in-class integrations with our partners, PagerDuty, xMatters, and OpsGenie, to offer incident management teams a consistent workflow throughout the incident management process.
ChatOps for Incident Management with PagerDuty
We’re excited to announce a new set of strategic integrations with PagerDuty, starting with solving one of the largest issues for incident response teams: communication. Your IT Ops team already lives in Hipchat, so it makes perfect sense to get your mission-critical alerts there. PagerDuty is a leader in managing escalations and alerts during incidents—which is complimentary to what Atlassian provides, a place for rapid response teams to communicate and collaborate on a solution.
PagerDuty’s Hipchat integration sends you rich incident notifications right where you’re already working. No need to search through a bunch of different apps for context—everything is already right at your fingertips in an easy-to-scan feed in your right sidebar. In addition, this powerful integration allows you to:
- Use slash commands to fix issues right from Hipchat, turning your chat room into a command center. (Why take minutes to solve a problem when you can take just seconds?)
- Set up your alerts with just a few clicks. In the PagerDuty Extensions Portal, you can map multiple services to individual Hipchat rooms, so the right people can see and respond to incident notifications.
- Sign in to PagerDuty from Hipchat, ensuring that only users with the necessary permissions can take actions within Hipchat. This also logs who took what action, promoting better security and incident analytics.
“The increased velocity of changes made in increasingly complex environments does not need to result in less reliable services. Development and Ops teams implementing best practice incident response processes and tools can minimize customer impact, and even address issues before customers notice. PagerDuty’s integration with Atlassian Hipchat creates a seamless workflow that organizes and automates incident response teams and activities, so incidents are identified and resolved faster, and ultimately prevented,” said Rachel Obstler, Vice President of Product Management at PagerDuty.
Enhanced communication and centralized operations—in other words, the essence of what makes ChatOps—are absolutely crucial to effective incident management. The PagerDuty integration helps developers take ChatOps to the next level. We’ll continue to work with PagerDuty on a number of exciting incident management capabilities and look forward to building on our partnership. You can find out more at PagerDuty.
xMatters Automates the Incident Management Workflow
We partnered with xMatters to integrate their leading integration-driven collaboration software across Hipchat, Jira Service Desk, and Statuspage—a powerful solution that brings together the tools and people needed to manage an incident. This partnership integrates the right people into your toolchains spanning DevOps, Ops, and service management solutions, and automates communications so you can proactively prevent outages, rapidly engage resolvers, and manage major incidents—all within the Atlassian toolset.
See how it works in this video:
[youtube https://www.youtube.com/watch?v=t0OKBq3QoyE&w=560&h=315]
We’ve also partnered with xMatters on a recent DevOps Maturity Research Report. Check out the full findings here: Atlassian and xMatters DevOps Maturity Survey.
Demo an incident in the OpsGenie Atlassian playground
Supporting software ecosystems and integrations is at the heart of what OpsGenie does, and together with the Atlassian product suite, it can help your team collaborate like no other. OpsGenie integrates across the Atlassian product suite, enabling a complete incident management workflow. Today, it’s also announcing the release of the OpsGenie Playground for Atlassian, where you can create an incident and walk through the process of resolution all the way to remediation, using all your Atlassian tools.
OpsGenie has ready-to-use integrations for hundreds of monitoring systems. When any of these systems notice an incident, OpsGenie alerts can be created to notify the right people. From these alerts, OpsGenie supports creating and managing Jira issues automatically, notifications and actions via Hipchat, automatic updates to Statuspage, and more!
And you can try it all out for yourself in their Playground for Atlassian.