As I figure out which team to join, oncall comes to mind. I've heard that oncall can be pretty intense and stressful here, especially on infra teams. The company has said that they're making efforts to fix this problem, but I'm unsure what to expect there.
How can I figure out whether a team has a healthy oncall rotation? I don't want join a team just to be burned out by a crazy oncall.
Related resources:
One thing to evaluate is the type of incidents the oncall has historically faced. (Best to talk to a senior eng on the team to talk you through it.)
Some oncalls are difficult because the team is basically middleware -- you get the alert, and your job is to find the correct team to actually fix the issue. These are not fun teams to be on, and it's hard to make these oncalls better.
However, some oncalls can be improved relatively easily, through better documentation, or building a simple tool around timing or correlation. If that's the case, joining one of these teams could actually be an opportunity! Improving the oncall is a great way to ramp up, and generally the bar is lower to "ship something" compared to a production feature.
Appreciate the advice! What are some examples of good and bad on-call schedules?