Senior Systems Reliability Operations Engineer

The Walt Disney Company, together with its subsidiaries and affiliates, is a leading diversified international family entertainment and media enterprise that includes three core business segments: Disney Entertainment, ESPN, and Disney Experiences.
Site Reliability
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS

Description For Senior Systems Reliability Operations Engineer

The Disney Technology Operations Command Center (DTOC) is seeking a Senior Systems Reliability Operations Engineer to join their 24x7x365 mission-critical services operation center. This role is responsible for monitoring, identifying, and coordinating with technologists across segments to fine-tune system operations and resolve service interruptions. The SRO Engineer will provide operational oversight and technical leadership, examining IT systems for defects and communicating maintenance schedules and critical events across the company.

Key responsibilities include:

  • Driving efficiency and effectiveness of Event, Incident, Major Incident, Request Fulfillment, and Problem Management processes
  • Implementing and maintaining technology observability and alerting solutions
  • Establishing and maintaining service technology level objectives (SLOs) and service level indicators (SLIs)
  • Proactively identifying, diagnosing, and resolving infrastructure, application, and IT operations issues
  • Developing automation tools and scripts to improve IT operations efficiency
  • Analyzing and publishing operational utilization and service performance metrics
  • Participating in creating and reviewing department procedures and operational readiness plans
  • Performing as incident commander on service outage calls

Required qualifications:

  • BA/BS in Computer Science, Engineering, or related field (or equivalent work experience)
  • 5+ years experience supporting converged infrastructure stacks
  • 5+ years leading incident recovery with multi-disciplined teams
  • 3+ years experience in large IT shared services or outsourced environment
  • Experience with cloud operations (AWS, Google Cloud, or Azure)
  • Strong understanding of Windows, Linux/Unix operating systems, and networking concepts
  • Proficiency in scripting languages (e.g., Python, Bash, Ruby) and automation tools
  • Familiarity with observability, monitoring, and alerting tools
  • ITIL v3 Certification preferred

This role offers the opportunity to work with world-class leaders and drive strategies that keep The Walt Disney Company at the leading edge of entertainment technology.

Last updated 2 months ago

Responsibilities For Senior Systems Reliability Operations Engineer

  • Drive efficiency and effectiveness of Event, Incident, Major Incident, Request Fulfillment, and Problem Management processes
  • Implement and maintain technology observability and alerting solutions
  • Establish and maintain service technology level objectives (SLOs) and service level indicators (SLIs)
  • Proactively identify, diagnose, and resolve infrastructure, application, and IT operations issues
  • Develop automation tools and scripts to improve IT operations efficiency
  • Analyze and publish operational utilization and service performance metrics
  • Participate in creating and reviewing department procedures and operational readiness plans
  • Perform as incident commander on service outage calls

Requirements For Senior Systems Reliability Operations Engineer

Python
Linux
  • BA/BS in Computer Science, Engineering or related field (or equivalent work experience)
  • 5+ years experience supporting converged infrastructure stacks
  • 5+ years leading incident recovery with multi-disciplined teams
  • 3+ years experience in large IT shared services or outsourced environment
  • Experience with cloud operations (AWS, Google Cloud, or Azure)
  • Strong understanding of Windows, Linux/Unix operating systems, and networking concepts
  • Proficiency in scripting languages (e.g., Python, Bash, Ruby) and automation tools
  • Familiarity with observability, monitoring, and alerting tools
  • ITIL v3 Certification preferred

Benefits For Senior Systems Reliability Operations Engineer

Medical Insurance
Dental Insurance
Vision Insurance
401k
Education Budget
  • Health Insurance & Wellbeing
  • Childcare Options
  • Paid Time Off
  • Retirement Programs
  • Tuition Assistance
  • Weekly Pay

Interested in this job?

Jobs Related To Disney Senior Systems Reliability Operations Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Disney, focusing on performance optimization and system reliability across Disney's digital platforms using cloud technologies.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Disney, focusing on building and maintaining reliable, scalable systems for entertainment technology platforms.

Site Reliability Engineer- SRE

Senior Site Reliability Engineer position at Apple, focusing on platform engineering and cloud infrastructure for hardware engineering tools and data analytics.

Senior Site Reliability Engineer - Observability and Telemetry Platform

Senior SRE position at NVIDIA focusing on observability and telemetry platforms, offering competitive salary and opportunity to work with cutting-edge cloud technologies.

Senior Production SRE Engineer - Storage

Senior Production SRE Engineer position at NVIDIA focusing on storage systems, requiring 5+ years experience and expertise in large-scale system reliability and automation.