Senior Site Reliability Engineer

Microsoft builds the data platform for the age of AI, powering data-first applications and driving a data culture through Azure Data engineering team.
$108,100 - $199,700
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS · Cloud

Description For Senior Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role is crucial in maintaining Microsoft's operational Database systems and ensuring high availability and performance. The position offers an opportunity to work with cutting-edge technology in a globally distributed, massively scalable database service. The ideal candidate will focus on automating root cause analysis, implementing proactive issue resolution, and maintaining strict Service Level Objectives (SLOs).

The role involves working with enterprise-level customers, handling service escalations, and driving innovative solutions for service reliability. You'll be part of a team that values diversity and different perspectives, working in a hybrid environment with up to 50% work from home flexibility. The position requires strong technical expertise in software engineering or systems administration, with a focus on large-scale distributed systems.

As an SRE at Microsoft, you'll be responsible for analyzing massive amounts of telemetry data, implementing automation solutions, and ensuring service reliability. The role offers competitive compensation, comprehensive benefits, and the opportunity to work with talented engineers in a startup-like environment within Microsoft. You'll be at the forefront of building and shaping the Livesite Automation and AI Ops stack in Cosmos DB, making a significant impact on critical systems used in Healthcare, Retail, Telecommunications, and IoT sectors.

Last updated 2 days ago

Responsibilities For Senior Site Reliability Engineer

  • Collaborating with engineering teams on building and enhancing tooling and automation solutions
  • Working with customers to understand pain points around Supportability and SLO attainment
  • Designing and implementing changes to service telemetry
  • Enhancing customer facing experience through proactive alerting
  • Analyzing data and providing operational insights to Design and Product teams
  • Being the single point of contact for large enterprise customers for service escalations

Requirements For Senior Site Reliability Engineer

Python
Java
  • 6+ years technical experience in software engineering, network engineering, or systems administration
  • Understanding of Observability and MELT implementation patterns for large-scale services
  • Experience in Logic Apps and authoring Jupyter Notebooks
  • 5+ years of SRE or SWE experience running large scale cloud services
  • 5+ years of hands-on experience in Python/Java/C#
  • 3+ years of operational experience in improving Service Reliability, Availability and Performance
  • Systematic problem-solving approach with effective communication skills
  • Must pass Microsoft Cloud Background Check

Benefits For Senior Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
Mental Health Assistance
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Azure, focusing on customer reliability engineering and cloud infrastructure improvements with hybrid work options in Sydney.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft focusing on cloud infrastructure health and datacenter operations.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Security, focusing on Identity and Access Management systems, offering competitive pay and remote work options.

Senior Site Reliability Engineer - CTJ - POLY

Senior SRE position at Microsoft working on Azure SQL services for government clouds, requiring security clearance and distributed systems expertise.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Security, focusing on identity and access management platforms with hybrid work model in Hyderabad.