We’ve distilled years of on-call trial and error into a new free ebook. On Call: The definitive guide to running productive and happy on-call teams provides a comprehensive road map to building an effective program while balancing the important human side of the on-call challenge.
Inside, you’ll find:
- A full, 100-plus-page guide to creating and implementing an effective on-call program.
- Analysis of commonly used on-call models, their effectiveness in the real world, and pitfalls to avoid when setting up your on-call teams.
- An exclusive foreword from Andrew Clay Shafer, Vice President of Transformation at Red Hat and former co-founder of Puppet
To celebrate the new project, we sat down with author and on-call expert Serhat Can for a quick question and answer session.
Q&A with author Serhat Can
Serhat Can is a versatile engineer who has built and operated products as part of Atlassian’s Opsgenie team since 2015. As an engineer and DevOps evangelist, his main interest is helping teams build better on-call and incident response practices.
Atlassian: Do you remember your first on-call shift?
Serhat: Fortunately, I was in good hands. I was about a month into working with my team and started participating in responding to customer issues alongside our customer success team.
I still think that’s a great way to get used to the idea of on-call, and you get the added benefit of learning more about your product. Then, after a few months, I was responding to prod alerts during business hours. It was scary, but I was lucky to have senior engineers right by my side who could help me if I got stuck.
A: Was there anything that surprised you as you dug deeper into this material?
S: The more I wrote, the more I became sure that on-call requires ongoing attention to detail and continuous improvement. There is no silver bullet. The book covers a lot of options and good practices on-call teams may employ while curating their on-call programs. And they are definitely helpful. But the broader picture is that on-call is influenced a lot by your company culture.
If your company cares about you as a human, they’re more likely to care about making your on-call experience better. If your company cares a lot about ownership and has practices in place to make you feel like you are part of something important, you’ll in turn care much more about doing a good job on-call. So, on-call is definitely influenced by these kinds of cultural aspects.
A: It feels like a big theme of your book is that these “technology systems” are only as resilient as the “people systems” that build and operate them. Do you think the rest of the world is realizing that?
S: I must say that most big tech companies realize that these technically skilled people are expensive to hire and keep. Investing in people isn’t just the right thing to do – it makes good business sense. Most smart companies realize that better tech is only possible with better people. Resilience is only possible with people architecting resilient systems and people taking care of these systems. On-call is a way we can clearly see the people affect and improve iteratively.
A: What do you think teams struggle with the most in adopting on-call practices?
S: I spent the last few years speaking at many DevOps conferences and chatting with brilliant people. Almost always, their first question is: how do I convince my management to make alerts and on-call better?
This is partly because some companies are still figuring out how critical IT is to their existence. And part of it is because of the business pressure. I think we can make great improvements to the latter with this book. The former – leadership buy-in – I think is easier now with a lot of people talking about DevOps. But we still need to talk business language and bring some data in. The book also covers that, but I must remind readers that change is not easy. As engineers, we should also try our best to understand our business and offer practical solutions to these challenges.
A: What do you think the future holds for on-call? Will more teams be going on-call? Will we see different types of professions embracing on-call?
S: To be honest, no one likes to be on call. Many professions have been practising on-call since way before those of us in IT. But as our dependency on online services has increased, our need for on-call has risen drastically. If business requires you to be on call, one of the best things you can do as an organization is to get help from as many people as possible. If someone is on call every other week, that becomes life impacting, while being on call every six weeks is much more tolerable. On top of this, if developers are on call, we see them create more quality code and pay more attention to observability.
Another key aspect is increased ownership once developers feel the customer impact of their work. Given these and other benefits, I see many organizations adopting on-call throughout their tech organization. And beyond engineering, we already see many customer support, marketing, and sales organizations making use of ideas from on-call scheduling and escalations to better serve their customers. I definitely see this need becoming more and more clear for many organizations.