Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role is crucial in maintaining Microsoft's operational Database systems and ensuring high availability and performance. The position offers an opportunity to work with cutting-edge technology in a globally distributed, massively scalable database service.
The ideal candidate will focus on automating root cause analysis and incident mitigation, working to maintain stringent Service Level Objectives (SLOs) of 99.99% availability and <10ms latency. You'll be part of a team that operates like a startup while having the resources and impact of a major tech company, building and shaping the Livesite Automation and AI Ops stack in Cosmos DB.
The role combines technical expertise with customer focus, requiring both deep technical knowledge and strong communication skills. You'll work on analyzing massive amounts of telemetry data, implementing automated solutions, and collaborating with various teams to improve service reliability. The position offers competitive compensation, comprehensive benefits, and the chance to work on technology that powers critical systems across healthcare, retail, telecommunications, and IoT sectors.
Microsoft values diversity and seeks candidates with different experiences and perspectives. The hybrid work environment (up to 50% work from home) offers flexibility while maintaining collaborative opportunities. This is an excellent opportunity for someone passionate about large-scale systems, automation, and service reliability to make a significant impact at one of the world's leading technology companies.