For customer-facing SaaS companies, setting up an alerting tool is a no-brainer. In the current climate of always-on services, companies need assurance that customers are getting the service they demand and expect – all the time. But many organizations still struggle to notify the right people at the right time. If your data center is on fire and you alert Karen while she’s vacationing in the Greek Isles, you (and poor Karen) have a problem.
Opsgenie recognizes that alerting is worthless unless you’re reaching stakeholders who are in a position to help. Through the use of on-call schedules, escalations, routing rules, and deep integrations, we empower teams to notify the right people at the right time, while limiting alert fatigue.
Here’s how to make the most of Opsgenie’s alerting features:
On-call schedules
Once your team is set up in Opsgenie, the next step is to create an on-call schedule. Our flexible tooling allows for daily, weekly, or custom (including follow-the-sun) schedules. Below, you’ll see an example of a typical schedule. The rotations are broken out into business hours and after hours. Using Routing rules, an alert will be routed to the schedule. Routing to an on-call schedule means that Opsgenie notifies the person on call when the alert comes through, rather than notifying someone who is on vacation, out sick, or simply not scheduled to work.
Escalation policies
An escalation policy determines who, when, and how to notify team members of an open alert or incident. The escalation policy shown below is a typical example of how you might choose to route your alerts. Alerts that come in after hours are routed to a specific schedule, and whoever is on call is contacted. In this case, the alert is first routed to a schedule, and then the notification goes to all team members until acknowledgement, ensuring that the Service Level Agreement for P1s is met. This escalation first notifies the entire team simultaneously. If no one acknowledges the alert Opsgenie will next notify each user individually. If the alert is still not acknowledged, the team Admins will be notified. The escalation cycle will repeat until the alert is addressed, and prevents P1s from getting lost in the shuffle.
Routing rules
Routing rules in particular set Opsgenie apart from other incident management tools. Many users are surprised to learn that you can route alerts based on the message, time of day, priority, source, tags, and more. Routing rules integrate with escalation policies to dispatch alerts to available team members. The flexibility in routing alerts based on payload elements is not available in any other tool at the time of writing. Using “If” and “And Then” logic, various filters can be set up to direct the alerts based on message, priority, time of day, etc.
The example here shows how routing rules can cut down alert fatigue for your team, sending notifications only when necessary. The first policy takes into account the priority of the alert. If the alert is designated as “critical” it is routed team members who are able to solve the problem. If the alert is received after hours, it will reach the folks who are on call. You will notice that the last rule routes all P5 alerts to “no one.” In these cases, Opsgenie records the incident, but does not notify anyone, since P5 alerts are informational only and don’t need immediate attention. No one wants to be woken up at 1 a.m. for something that can easily be addressed the next day.
Notification rules
Opsgenie sends notifications via phone call, SMS, mobile push, or email. One strategy for cutting down on unnecessary alerts is to determine the notification method based on priority. P4s and P5s are low priority; if the server room is one degree warmer than usual, it’s good to have that on record, but it doesn’t require immediate intervention.
If there’s a problem with the SQL server and it’s impacting customers, then the alert is a P1, and an immediate phone call is warranted. The image below shows how easy it is to sort your notification methods based on alert priority.
But what if I work out of a chat tool?
Our tech support team uses various Slack channels for alerts that aren’t work-related – curious engineers set them up to test functionality and establish new use cases. Some monitor mentions of Opsgenie on Twitter, weather alerts, and even police chases.
We want customers to be empowered to handle incidents and issues using the tools they’re already familiar with, so we’ve developed deep integrations with Slack and Microsoft Teams for seamless collaboration. Opsgenie’s available slash commands are more extensive than other products on the market, so you can do more in less time. You can determine who is on call, open, close, and resolve alerts, add tags, notes, and more without having to leave your chat tool. There are also handy buttons for acknowledging, closing, and reassigning while on the go.
Here’s a snippet of what those buttons look like in Slack:
Here’s how easy it is to use Microsoft Teams for alert resolution:
By providing flexibility in the methods used for notification and in the way that alerts are routed, Opsgenie empowers customers to notify the right people at the right time, every time. So let Karen enjoy her vacation, and leave the alerts to those who are empowered to take action.