candiddevmike
What I tell people new to on-call: ask for overtime, time off, or some form of compensation (and don't go along with the "it's part of your standard compensation" bullshit).

On-call is you agreeing to give your time, your weekends, your freedom to your work. It's beyond the standard 8-5. Don't trade that for free.

CalRobert
"This could destroy your marriage and your family." seems like a good intro.

I still remember getting paged when I was explicitly NOT on the pagerduty rotation at my tenth anniversary dinner with my wife. Ruined the whole day. And I wasn't making particularly good money.

If you aren't free to do what you want, you should be getting paid like you're at work.

xtracto
On call is kind of like Open Offices layouts. It was created to reduce costs for companies, at the sake of abusing employees.

Whatever happened to hiring people in schedules that will cover the 24/7 shift.

Manufacturing companies don't have "on calls" they have 12/12 shifts, or 3x4 shifts.

TehShrike
If you work a job where having one of these incidents once a year or more is "normal" then the dev team needs to devote most of its time to fixing that, or you need to change employers.
odysseus
Good advice in this article, especially the bits about communicating every 15-30 minutes depending on severity. Comms are invaluable for timeline/postmortems.

Also, for secondary/shadow on-calls, you will need to remind the primary to loop you in, as they will be busy.

Try not to be on-call too often, but also try not to be on-call too little. You need exposure to the latest types of events happening and don't want to get rusty. Once every 1.5 months is a good balance for me.

matrix87
Just a question for other people who have been in industry longer. I'm somewhat new, wondering if my company's oncall is "normal" or abusive

My current company has a rotating week of oncall. Happens every 2 months or so. Oncall gets paged first and is expected to be available 24/7. But if they escalate further, whichever dev or manager it gets escalated to is expected to be available 24/7

By 24/7, I mean, they don't tell you that you're allowed to sleep. They just fired a manager for not being willing to wake up in the middle of the night for pages

Edit: also a bunch of people on our team think it's normal to ping and ask for help on non oncall stuff outside of business hours (like 7 or 8)

Edit 2: I forgot to add, we are not paid anything extra for oncall (or any additional work time outside of business hours). It's salaried

exmicrosoldier
There used to be a seperate job for this and they dumped it on startup engineers (otherwise failing businesses) and now they dump it on all engineers.
parpfish
I’ve often heard the advice for on all to focus on triage and call in support for big problems.

But… doesn’t that mean that everybody is technically on call? There the main person answering the pager, but if the expectation is that they can pull in reinforcements as needed, that means everyone should be ready to get pulled in to action at all times.

technick
Pager Duty gave me insomnia, it's not worth it. Tell your organization they can hire someone to work that shift.
jordemort
What I tell people new to on-call: "Quit. Find another job."

I won't accept jobs with on-call rotations anymore.

bcrosby95
On call means different things for different companies. We used to page for non-emergencies. But we eventually changed it to page for actual service outages or core metrics shitting the bed. If one of those two aren't happening, it waits until morning or Monday.

Or, maybe, if you're large enough... hire night shift people. I have friends who cut their teeth on night shift ops.

I have friends who work more on the sysadmin side of things, and on call for them just seems like extra work. They're glued to their laptops answering requests.

zabzonk
It was a dreaded feature of one job I had way back when, supporting a server for static trading data (counterparty info, and other stuff) for sites in London, Hong Kong and New York. We all hated it, until one day one of the guys "lost" the support laptop on the Tube. We then did a bit of scripting and fiddling with permissions so that the guys in HK and NY could fix all common problems by "turn it off and turn it on again" magic.

Bye-bye support trauma.

dopylitty
If a company wants systems up 24/7 they should hire three shifts of people to support it.

Not willing to pay for three shifts? Shut the system off.

deathanatos
This is a pretty good article.

Similarly to the "Heroism isn't…" section, I'd say: Breathe. I've been asked "how do you stay so calm when something is going wrong?" and the honest truth is I'm scared! Or at least, I have that pit, in my stomach, going "oh no it's not working, will we figure this one out?" It's just not a useful thing. Tell that fear to take a backseat, and attempt to let the more logical side of you problem solve. And like TFA says, call for help if you need it; two minds are better than one.

At the management level, you can also do a sort corollary to basically everything in TFA too: "call for help": your engineers need to be able to call for help. That means retaining experience, so that the younger engineers can learn from the older ones, and hopefully not trial-by-fire their entire career, and have someone they can fall back on for help. Same goes for the experience devs, too: it means you need two experienced devs. I've been the only experienced person on the team, and it sucks, because I don't have the answer to everything. "It is your job to see that issues get addressed." — at the management layer, you need to make sure the incentives are focused on that, not something inane, like "mean time to resolution". Time to "the incident in PagerDuty is closed" is meaningless, and will be gamed to something like "we closed the incident because the immediate instance of the problem / symptom has been dealt with". You want the actual, underlying root cause debugged and fixed, and ideally, that eng should never see that entire class of problem again. But this means understanding the root cause, and understanding the system well enough to see the problem through to conclusion, which often means things like "ok, this needs to be fixed, *and I need to prioritize someone familiar with that portion of the system to fix it" — and all too often, that follow-through just doesn't happen. And when it doesn't, your eng pays for it, in the form of getting woken up. "Don't sacrifice your health" — are your eng sacrificing their health? Is your on-call experience too often? (At the lowest, I've been oncall 100% of the time. That was too often!)

aaomidi
I really disagree with involve other people.

This implies that everyone is effectively on call 24/7.

You have a primary and secondary. No one else should be paged unless they’re on the rotation at that time.

infomaniac
Worked for an eCom store for 3 years, was on-call 24/7 for most of it due to understaffing. In this context, every second of downtime was actual money being lost. CEO drilled in the gravity of each outage.

Took me a good few months after changing jobs to not get crazy anxiety every time my phone rang.

After working in a much healthier on-call setup later in my career supporting a large SaaS, I actually really like it. High stakes produce quick learnings.

Not for everyone, but everyone should try it (and be compensated FFS).

rufus_foreman
I did on-call in the 90's. I wouldn't do it today on a regular basis, I need my sleep more than I need any job.

I was a DBA back then. The author of the posts talks about calling for "back-up", back then they would page me (still just pagers then) and say the database was down. Most of the time the database wasn't down, and there was no real evidence the database was down, they were just calling for "back-up".

I've worked at many jobs since then, including at startups, never did an on-call rotation at any of them since the DBA job. If it's important, you should probably have fully awake people scheduled to deal with it.

To me that's different than supporting a release of code that you wrote. I mean if we really have to do this release at 9 PM and I'm the guy who wrote it, I'll probably show up, after I slept all day. But I'm no longer your database buddy that you commiserate with during the night when everything goes to shit since no one else will answer the pager.

invalidname
This is a great timing for me for this post. Lots of good if somewhat obvious advice.

I'm considering picking up on-call duties in my new role. In my last company they expected us to do on-call as part of the job but only during what they defined as "working hours" which didn't fit with my schedule. That was one of the reasons I left that particular role. But here they give an 18% salary boost for the on-call duties, and I love debugging hard production problems which is a huge plus.

poopsmithe
The handwritten text is not legible