Azure CXP CRE is seeking a customer-focused Reliability Engineer passionate about customer reliability engineering, including availability, reliability, resiliency, and uptime at scale for the Azure platform. This role is accountable for improving customer experience on Azure and involves diagnosing and troubleshooting mission-critical customer applications built on the Microsoft Azure platform. The ideal candidate will demonstrate technical breadth while managing complex, highly available services and have a deep understanding of the underlying components (Azure Platform, Azure SDK, Azure Portal). They will work directly with customers, customer support, live site teams, and engineering.
Responsibilities include: • Participate in an on-call coverage rotation (approximately 15% of the time) for platform communications and security. • Collaborate closely with engineering and product management teams to drive product improvements based on customer feedback. • Improve the customer experience by analyzing signals from various sources and driving root cause analyses (RCAs) and service improvements involving bug fixes. • Drive continuous improvement in the Azure platform by incorporating feedback from internal and external customers. • Identify and drive requirements for enhanced customer resiliency and platform reliability. • Identify and drive the implementation of customer-centric mitigation strategies and playbooks for operations. • Participate in the design of next-generation architecture for cloud infrastructure services, with a focus on strategic customer scenarios.
Qualifications: • Must have service engineering experience in a 24/7/365 enterprise environment. • Technical expertise in Azure services and capabilities or cloud platforms (desired). • Fluency in one or more automation languages (e.g., PowerShell, CLI). • Strong communication skills for leading and managing communication with customers, internal Microsoft stakeholders, and third-party vendors. • Understanding of high availability, disaster recovery, business continuity, and performance tuning. • Strategic thinking, quantitative and analytical skills, team leadership, and collaboration. • Excellent problem resolution, judgment, negotiation, and decision-making skills. • Strong knowledge of the Windows platform or Linux, developer tools, and the ability to diagnose and debug user code (desired). • BS/BA in computer science, engineering, mathematics, or equivalent experience (desired).
This role offers the opportunity to work on one of Microsoft's most exciting products and advance Microsoft's cloud-first strategy. The successful candidate will be part of the Azure Customer Experience (CXP) Customer Reliability Engineering (CRE) Team, which leads world-class customer reliability initiatives and provides modern, customer-centric experiences at scale.