Senior Site Reliability Engineer

Microsoft empowers every person and organization on the planet to achieve more through technology and cloud solutions.
Site Reliability
Senior Software Engineer
Hybrid
6+ years of experience
Enterprise SaaS · Cloud

Description For Senior Site Reliability Engineer

Microsoft's M365 COSMIC team is seeking a Senior Site Reliability Engineer to join their innovative platform team. The role focuses on maintaining and improving a global-scale managed-runtime environment based on Azure Kubernetes Service for Microsoft Substrate service and developers. As an SRE, you'll be responsible for ensuring platform health, managing upgrades, and implementing automation for incident remediation. The position offers a unique opportunity to work with cutting-edge cloud technology while maintaining critical infrastructure components.

The ideal candidate will bring strong technical expertise in software engineering or systems administration, with particular emphasis on cloud services and Kubernetes. You'll be part of a team that designs, builds, and operates solutions enabling substrate service teams to focus on their core business requirements rather than infrastructure concerns.

Working in a hybrid environment with up to 50% work from home flexibility, you'll collaborate with cross-functional teams to improve platform stability and efficiency. Microsoft offers comprehensive benefits including industry-leading healthcare, educational resources, and parental leave, along with a strong culture of inclusion and innovation.

This role presents an excellent opportunity for experienced engineers who want to impact Microsoft's cloud infrastructure at a global scale while working with the latest technologies in cloud computing and container orchestration. The position combines technical challenges with the opportunity to contribute to Microsoft's mission of empowering every person and organization on the planet to achieve more.

Last updated 20 days ago

Responsibilities For Senior Site Reliability Engineer

  • Keep the platform components updated incorporating the dependencies from other applications/tech stacks and debug any issues arising out of such upgrades/updates
  • Continuously improve platform by identifying patterns in service alerts / incidents and building solutions for auto-remediation
  • Build dashboard/alerts for faster identification of issues and keeping the system health in check
  • Collaborate with cross-functional teams to define, design, and ship new features to keep the platform health stable

Requirements For Senior Site Reliability Engineer

Kubernetes
Linux
  • 6+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree with 3+ years experience OR Master's Degree with 2+ years experience
  • Experience with or exposure to Agile and iterative development processes
  • Must pass Microsoft Cloud Background Check
  • Cloud and services experience, with Azure cloud experience
  • Working knowledge on Kubernetes

Benefits For Senior Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Digital, focusing on building and maintaining scalable infrastructure and driving automation initiatives.

Senior Site Reliability Engineer

Senior SRE role at Microsoft working on Azure Cosmos DB, focusing on service reliability, automation, and maintaining high-availability systems at global scale.

Senior Site Reliability Engineer (SRE) - Teams

Senior Site Reliability Engineer position at Microsoft Teams, focusing on improving service reliability, performance, and security through software engineering solutions.

Senior Site Reliability Engineer - CTJ - POLY

Senior SRE role at Microsoft working on Azure SQL services for government clouds, requiring security clearance and distributed systems expertise.

Site Reliability Engineer

Senior Site Reliability Engineer role at Microsoft Azure focusing on platform reliability, customer experience, and cloud infrastructure in Sydney.