Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role is crucial in maintaining Microsoft's operational Database systems and ensuring high availability and performance. The position offers an opportunity to work with cutting-edge technology in a globally distributed, massively scalable database service. The ideal candidate will focus on automating root cause analysis, implementing proactive issue resolution, and maintaining strict Service Level Objectives (SLOs).
The role involves working with enterprise-level customers, handling service escalations, and driving innovative solutions for service reliability. You'll be part of a team that values diversity and different perspectives, working in a hybrid environment with up to 50% work from home flexibility. The position requires strong technical expertise in software engineering or systems administration, with a focus on large-scale distributed systems.
As an SRE at Microsoft, you'll be responsible for analyzing massive amounts of telemetry data, implementing automation solutions, and ensuring service reliability. The role offers competitive compensation, comprehensive benefits, and the opportunity to work with talented engineers in a startup-like environment within Microsoft. You'll be at the forefront of building and shaping the Livesite Automation and AI Ops stack in Cosmos DB, making a significant impact on critical systems used in Healthcare, Retail, Telecommunications, and IoT sectors.