This article has been updated and enriched for February 2022
Tech products play a crucial role in many of today’s lives. Some definitions of “crucial" are a little different to others, but banking apps, communication platforms, health systems, and travel management software are just a handful of examples that genuinely prop up modern society.
Add that to the ever-present SLA (service level agreement, legally guaranteeing a certain standard of service), and the pressure to optimize your MTBF (mean time before failure), MTTR (mean time to recovery, repair, respond, or resolve), MTTF (mean time to failure), and MTTA (mean time to acknowledge) and solving your product’s problems quickly and efficiently becomes of huge importance for both customers and your bottom line.
To keep these metrics low, businesses have lots of options: today, we’re going to look at the on-call strategy. In smaller businesses, this may not even be a decision - it’s just that the lucky victim must keep his or her mobile on all weekend or night.
This isn't sustainable - if anyone is expected to work outside their standard work hours, you need to have an on-call rota.
The interplay between DevOps and on-call strategies is an interesting one. On one hand, some parts of having an on-call culture can be harder on techs in DevOps-focused companies. With the motto “you build it, you run it, you take care of it when it falls over”, there can be a lot of stress on individual devs and teams to be highly available when things go wrong.
On the other hand, with DevOps' insistence on resilience, things should go wrong much less frequently. Also, since devs should be intimately familiar with their own code, troubleshooting tends to be easier and take less time.
You’ll notice that we said, “tends”...
Regardless of size or stage in the DevOps journey, more than an on-call schedule, you need an on-call strategy. Businesses are living, growing things, and no matter how resilient your systems or how competent your devs, things will go wrong. Things will also change, and changes can make old schedules ineffective or redundant with remarkable speed.
On-call has a bad reputation, mainly because techs get stretched too thin and are asked to be so available, so frequently, that they never get any true downtime. This is especially true in startup culture and countries where work-life balance isn’t protected by law and common consensus - devs on-call burn out, crash, and cash out.
That’s not the only way to do things. With planning, practicality, and a little empathy, you can create an on-call strategy that will support your product and customers, respect your devs, and grow and breathe with your business’ transformation.
Read on to find out how to do on-call better.
Luckily, when an on-call strategy stinks, it tends to give off some pretty clear signals. Seen any of these on your team? Tread carefully, because it’s very likely you have a sucky on-call rota!
Creating a great on-call strategy isn’t hard, but it does take deliberate work and discussion. If you already have one, make it better. If you don’t have one, make one! You’ll keep your devs happy and productive, your customers satisfied and positive, and your company calm and controlled in the case of a true emergency.
Nobody loses when you’ve got your on-call sorted out, so sort it out today!
Liked this article? We send really good emails too (just try it!)