Site Reliability Engineer

Site Reliability
Senior Software Engineer
Hybrid
5+ years of experience
Cloud

Description For Site Reliability Engineer

Azure CXP CRE is seeking a customer-focused Reliability Engineer passionate about customer reliability engineering, including availability, reliability, resiliency, and uptime at scale for the Azure platform. This role is accountable for improving customer experience on Azure and involves diagnosing and troubleshooting mission-critical customer applications built on the Microsoft Azure platform. The ideal candidate will demonstrate technical breadth while managing complex, highly available services and have a deep understanding of the underlying components (Azure Platform, Azure SDK, Azure Portal). They will work directly with customers, customer support, live site teams, and engineering.

Responsibilities include: • Participate in an on-call coverage rotation (approximately 15% of the time) for platform communications and security. • Collaborate closely with engineering and product management teams to drive product improvements based on customer feedback. • Improve the customer experience by analyzing signals from various sources and driving root cause analyses (RCAs) and service improvements involving bug fixes. • Drive continuous improvement in the Azure platform by incorporating feedback from internal and external customers. • Identify and drive requirements for enhanced customer resiliency and platform reliability. • Identify and drive the implementation of customer-centric mitigation strategies and playbooks for operations. • Participate in the design of next-generation architecture for cloud infrastructure services, with a focus on strategic customer scenarios.

Qualifications: • Must have service engineering experience in a 24/7/365 enterprise environment. • Technical expertise in Azure services and capabilities or cloud platforms (desired). • Fluency in one or more automation languages (e.g., PowerShell, CLI). • Strong communication skills for leading and managing communication with customers, internal Microsoft stakeholders, and third-party vendors. • Understanding of high availability, disaster recovery, business continuity, and performance tuning. • Strategic thinking, quantitative and analytical skills, team leadership, and collaboration. • Excellent problem resolution, judgment, negotiation, and decision-making skills. • Strong knowledge of the Windows platform or Linux, developer tools, and the ability to diagnose and debug user code (desired). • BS/BA in computer science, engineering, mathematics, or equivalent experience (desired).

This role offers the opportunity to work on one of Microsoft's most exciting products and advance Microsoft's cloud-first strategy. The successful candidate will be part of the Azure Customer Experience (CXP) Customer Reliability Engineering (CRE) Team, which leads world-class customer reliability initiatives and provides modern, customer-centric experiences at scale.

Last updated 5 hours ago

Responsibilities For Site Reliability Engineer

  • Participate in an on-call coverage rotation for platform communications and security
  • Collaborate with engineering and product management teams to drive product improvements
  • Analyze signals and drive root cause analyses (RCAs) and service improvements
  • Drive continuous improvement in the Azure platform
  • Identify and drive requirements for enhanced customer resiliency and platform reliability
  • Implement customer-centric mitigation strategies and playbooks for operations
  • Participate in the design of next-generation architecture for cloud infrastructure services
  • Be enthusiastic, self-motivated, and a great team player
  • Demonstrate excellent collaboration, organizational, and time management skills
  • Be data-driven with a focus on achieving business results in projects
  • Develop key partnerships

Requirements For Site Reliability Engineer

  • Service engineering experience in a 24/7/365 enterprise environment
  • Fluency in one or more automation languages (e.g., PowerShell, CLI)
  • Strong communication skills
  • Understanding of high availability, disaster recovery, business continuity, and performance tuning
  • Strategic thinking, quantitative and analytical skills
  • Excellent problem resolution, judgment, negotiation, and decision-making skills
  • Ability to manage and prioritize multiple tasks
  • Excellent written and oral communication skills

Benefits For Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Site Reliability Engineer

Site Reliability Engineer L4/L5 - Live Streaming Pipeline

Netflix is hiring a Senior Site Reliability Engineer for their Live Streaming Pipeline, offering remote work and competitive compensation.

CDN Site Reliability Engineer (SRE) L4/L5

Netflix seeks CDN Site Reliability Engineer to design, scale, and operate global content delivery network, ensuring seamless streaming for millions.

Site Reliability Engineer - REST API

Apple is hiring a Site Reliability Engineer for their Vision Pro team to support event operations, focusing on API integration and automation.

Senior Site Reliability Engineer

Senior Site Reliability Engineer at Microsoft, ensuring product reliability and solving complex customer issues in Windows services.