Microsoft's Cloud Operations & Innovation (CO+I) team is seeking a Senior Site Reliability Engineer to join their Cloud Infrastructure Health team. This role is crucial in powering Microsoft's cloud services through unified global datacenters, which enable approximately 30% of Microsoft's revenue through Commercial Cloud.
The position involves working with state-of-the-art distributed systems that handle petabyte-scale telemetry using Machine Learning and traditional software to ensure Cloud Availability and Safety goals. You'll be responsible for analyzing telemetry data from datacenter critical environments and secondary signals in both real-time and offline scenarios to enable time-sensitive insights directly impacting Cloud Operations.
As a Senior SRE, you'll identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design. The role requires strong collaboration skills to work with other engineering teams, ensuring services and systems maintain high stability and performance. You'll be responsible for developing code, scripts, and tools that reduce operational burden through automation, particularly in areas such as onboarding system capabilities to newer data centers.
Key responsibilities include participating in on-call rotations, resolving live site incidents, and documenting solutions that prevent issue recurrence. You'll also be involved in capacity planning, pattern identification, and trend analysis to drive continuous improvement.
The position offers competitive compensation, comprehensive benefits, and the opportunity to work in a hybrid environment with up to 50% work from home flexibility. You'll be part of Microsoft's mission to empower every person and organization on the planet to achieve more, working in an inclusive environment that values growth mindset, innovation, and collaboration.
This role is perfect for someone with strong technical experience in software engineering or systems administration, who is passionate about cloud infrastructure and enjoys working with complex distributed systems. The position offers excellent growth opportunities and the chance to make a significant impact on Microsoft's cloud computing transformation.