Senior Site Reliability Engineer

Microsoft is a company where passionate innovators collaborate to build the data platform for the age of AI, powering data-first applications and driving a data culture.
$108,100 - $199,700
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS · Cloud

Description For Senior Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role is crucial in maintaining Microsoft's operational Database systems and ensuring high availability and performance. The position offers an opportunity to work with cutting-edge technology in a globally distributed, massively scalable database service.

The ideal candidate will focus on automating root cause analysis and incident mitigation, working to maintain stringent Service Level Objectives (SLOs) of 99.99% availability and <10ms latency. You'll be part of a team that operates like a startup while having the resources and impact of a major tech company, building and shaping the Livesite Automation and AI Ops stack in Cosmos DB.

The role combines technical expertise with customer focus, requiring both deep technical knowledge and strong communication skills. You'll work on analyzing massive amounts of telemetry data, implementing automated solutions, and collaborating with various teams to improve service reliability. The position offers competitive compensation, comprehensive benefits, and the chance to work on technology that powers critical systems across healthcare, retail, telecommunications, and IoT sectors.

Microsoft values diversity and seeks candidates with different experiences and perspectives. The hybrid work environment (up to 50% work from home) offers flexibility while maintaining collaborative opportunities. This is an excellent opportunity for someone passionate about large-scale systems, automation, and service reliability to make a significant impact at one of the world's leading technology companies.

Last updated 9 days ago

Responsibilities For Senior Site Reliability Engineer

  • Collaborate with engineering teams on building and enhancing tooling and automation solutions
  • Work with customers to understand pain points around Supportability and SLO attainment
  • Design and implement changes to service telemetry
  • Enhance customer facing experience through proactive alerting
  • Analyze data and provide operational insights to Design and Product teams
  • Interface with large enterprise customers for handling service escalations

Requirements For Senior Site Reliability Engineer

Python
Java
  • 6+ years technical experience in software engineering, network engineering, or systems administration
  • Understanding of Observability and MELT implementation patterns for large-scale services
  • Experience in Logic Apps and authoring Jupyter Notebooks
  • Experience in analyzing and troubleshooting large-scale distributed systems
  • 5+ years of SRE or SWE experience running large scale cloud services
  • 5+ years of hands-on experience in Python/Java/C#
  • 3+ years of operational experience in improving Service Reliability
  • Must pass Microsoft Cloud Background Check

Benefits For Senior Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
Mental Health Assistance
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Digital, focusing on building and maintaining scalable infrastructure and driving automation initiatives.

Senior Site Reliability Engineer

Senior SRE position at Microsoft maintaining global-scale Kubernetes platform with focus on automation and system reliability.

Senior Site Reliability Engineer (SRE) - Teams

Senior Site Reliability Engineer position at Microsoft Teams, focusing on improving service reliability, performance, and security through software engineering solutions.

Senior Site Reliability Engineer - CTJ - POLY

Senior SRE role at Microsoft working on Azure SQL services for government clouds, requiring security clearance and distributed systems expertise.

Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Azure focusing on platform reliability, customer experience, and cloud infrastructure in Sydney.