Imagine Learning + Atlassian
Imagine Learning Quiets the Noise and Alleviates Alert Fatigue with Opsgenie
Industry
EdTech
Location
Provo, Utah
Number of Users
241
Products
Marketplace Apps
Share Page
Imagine learning is an EdTech company founded in 2004, serving students for whom traditional teaching is not effective. The software is designed for students ages pre-k to high school, who may have trouble with reading, writing, and math. Schools purchase the software to cater to students with learning difficulties and help them to succeed in their studies.
Feeling the Pain of Alert Fatigue
Keith Smith, joined Imagine Learning as the Principal Site Reliability Engineer after he’d already been in the DevOps space for years. He was familiar with various incident monitoring tools including Opsgenie. Imagine Learning had many tools in place, but consolidation and effective alerting were just not there.
“[At the time] the on-call team only got alert messages via email-it was stupid, there was so much noise. I would get up each night at 1 am, look at my phone and go back to bed. I set out to say there is a better way.”
Due to all the noise, the alerts weren’t meaningful and they weren’t actionable. The process was completely reactive and teams were left without an efficient way to communicate during incidents.
“Support call volume would go up, which indicated a problem, and then the rep in support would escalate it. But that was the only chain of communication— the customer would tell us something was wrong and then we would fix it.”
Making the Business Case
Keith knew he needed to implement a tool like Opsgenie but had to formulate a business case for upper management, which turned out to be easier than expected.
“Two weeks into the job, I was setting up alerts and looking at metrics when I realized we had been down for 24 hours and had no idea!” The problem was fixed pretty quickly, but a 24-hour outage was completely avoidable.
He had also essentially become a single point of failure, which was not scalable or sustainable for a company with over 500 employees scattered across the U.S., India, and Argentina.
Within 3 months of adopting Opsgenie, we reduced the number of incidents by 900%.
Keith Smith
Principal Site Reliability Engineer
Consolidation and improved communication were key to maintaining the infrastructure required for the company’s success.
“What if I went on vacation? What happens to the alerts for two days. I went to my boss and I told him, this is not sustainable, we’re going to have issues and cited sources [including the 24 hour outage] to prove it.”
Between a painful on-call schedule with mostly email alerts, a reactive approach to problems, and metrics to backup a need for a modern incident management platform the case was made, and Imagine Learning moved forward with Opsgenie.
Reliable Alerting
With over 20 tools and applications to manage Opsgenie’s ability to integrate with their IT Stack was key to quieting the noise.
“Everytime I have wanted to connect a source to OG, there has been a path - even if just webhook.”
Deep integrations with Slack and JIRA mean Imagine Learning now has an automated process. Opsgenie updates the status page, creates a Jira ticket, kicks out a Slack notification, and wakes the right people up at the right time.
“Now we have one major incident every year, it’s becoming more fun. I’m able to sleep at night and it freed up my time to work on other projects.”
Keith Smith
Principal Site Reliability Engineer
“Beyond a faster MTTR, the biggest thing we gain is the communication piece, telling our customer’s what’s going on and the 500 people in our offices across the country [and world] as soon as an incident hits.”
Relief After Opsgenie
Sharing the on-call schedule and only getting woken up when necessary enables Keith to diversify his work and empowered him to reduce response time from 24-36 hours to just a brief 15 minute window or less.
Opsgenie enabled Keith to create an efficient incident management and on-call process that reduced MTTR and also improved his team’s quality of life. For a company providing a software product, resolving an issue quickly is vital. Within 3 months of using Opsgenie there was a 900% reduction in incident volume.
“Now we have one major incident every year, it’s becoming more fun. I’m able to sleep at night and it freed up my time to work on other projects.”
Start your free 14-day trial of Opsgenie today
Enabling cloud-focused businesses with Opsgenie
Delivering Always-On Services Through Agile Incident Management