Senior Site Reliability Engineer

Microsoft is a global technology company that powers cloud services through unified global datacenters, enabling Commercial Cloud services.
$117,200 - $229,200
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
Enterprise SaaS · Cloud

Description For Senior Site Reliability Engineer

Microsoft's Cloud Operations & Innovation (CO+I) team is seeking a Senior Site Reliability Engineer to join their Cloud Infrastructure Health team. This role is crucial in powering Microsoft's cloud services through unified global datacenters, which enable approximately 30% of Microsoft's revenue through Commercial Cloud.

The position involves working with state-of-the-art distributed systems that handle petabyte-scale telemetry using Machine Learning and traditional software to ensure Cloud Availability and Safety goals. You'll be responsible for analyzing telemetry data from datacenter critical environments and secondary signals in both real-time and offline scenarios to enable time-sensitive insights directly impacting Cloud Operations.

As a Senior SRE, you'll identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. The role requires strong collaboration skills to work with other engineering teams, ensuring services and systems maintain high stability and performance. You'll be responsible for developing code, scripts, and tools that reduce operational burden through automation, particularly in areas such as onboarding system capabilities to newer data centers.

Key responsibilities include participating in on-call rotations, resolving live site incidents, and documenting solutions that prevent issue recurrence. You'll also be involved in capacity planning, pattern identification, and trend analysis to drive continuous improvement.

The position offers competitive compensation, comprehensive benefits, and the opportunity to work in a hybrid environment with up to 50% work from home flexibility. You'll be part of Microsoft's mission to empower every person and organization on the planet to achieve more, working in an inclusive environment that values growth mindset, innovation, and collaboration.

This role is perfect for someone with strong technical experience in software engineering or systems administration, who is passionate about cloud infrastructure and enjoys working with complex distributed systems. The position offers excellent growth opportunities and the chance to make a significant impact on Microsoft's cloud computing transformation.

Last updated 2 days ago

Responsibilities For Senior Site Reliability Engineer

  • Own deployment, availability, reliability, performance and customer escalation targets for Critical Environment Telemetry solutions
  • Design, develop, and maintain data pipelines and back-end services for real-time decisioning
  • Write high quality, maintainable and high-performance code
  • Manage automated unit and integration test suites
  • Work with Project Managers and business stakeholders to design and deliver new features
  • Identify opportunities and drive implementation of monitoring and automation capabilities
  • Investigate and resolve Customer Reported Incidents

Requirements For Senior Site Reliability Engineer

Linux
Kubernetes
  • 6+ years technical experience in software engineering, network engineering, or systems administration
  • 2+ years of experience working in systems uptimes, performance, service monitoring and capacity planning
  • Bachelor's or Master's Degree in Computer Science, Information Technology, or related field
  • Pass Microsoft Cloud Background Check

Benefits For Senior Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
Mental Health Assistance
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Azure, focusing on customer reliability engineering and cloud infrastructure improvements with hybrid work options in Sydney.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft, focusing on Azure Cosmos DB service reliability and automation, offering hybrid work and competitive benefits.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Security, focusing on Identity and Access Management systems, offering competitive pay and remote work options.

Senior Site Reliability Engineer - CTJ - POLY

Senior SRE position at Microsoft working on Azure SQL services for government clouds, requiring security clearance and distributed systems expertise.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Security, focusing on identity and access management platforms with hybrid work model in Hyderabad.