Principal Site Reliability Engineer

Microsoft is a global technology company that empowers people and organizations to achieve more through cloud computing and software solutions.
Site Reliability
Principal Software Engineer
Remote
8+ years of experience
Enterprise SaaS · Cloud

Description For Principal Site Reliability Engineer

Microsoft's Azure Customer Experience (CXP) team is seeking a Principal Site Reliability Engineer to join their Observability team. This role is crucial in ensuring the reliability and performance of customer applications on Microsoft Azure. The position focuses on implementing Service Level Objectives (SLOs) monitoring solutions for top Azure customers. The ideal candidate will be customer-obsessed and experienced in cloud computing, with expertise in implementing monitoring solutions and managing service level objectives. The role offers a startup-like environment within Microsoft's established cloud platform, emphasizing automation, observability, and proactive monitoring. The team values diversity, inclusion, and empowers members to work authentically while achieving their career goals. This position combines technical expertise with customer engagement, requiring both strong engineering skills and excellent communication abilities. The role involves up to 100% remote work with 0-25% travel, offering comprehensive benefits and the opportunity to impact Microsoft's cloud services at scale.

Last updated 9 days ago

Responsibilities For Principal Site Reliability Engineer

  • Design and implement monitoring solutions for Azure customers
  • Manage and implement Service Level Objectives (SLOs)
  • Customer engagement and communication
  • Advocate for customer needs and work towards resolutions
  • Design and implement observability solutions
  • Debug and launch commercial software products or web services

Requirements For Principal Site Reliability Engineer

Kubernetes
Python
  • Bachelor's or master's degree in Computer Engineering (or equivalent)
  • Proven expertise in implementing and managing Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
  • Experience in designing and implementing monitoring solutions
  • Extensive experience with monitoring tools and platforms
  • Advanced certifications in SRE or related fields
  • Experience in observability, OpenTelemetry, Prometheus, Grafana, Dynatrace, Datadog, AzureMonitor, AI, ML
  • Must pass Microsoft Cloud Background Check

Benefits For Principal Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal SRE role at Microsoft focusing on healthcare solutions, managing database systems and optimizing performance with 8+ years experience required.

Principal Engineer, AI, Trust, Security, Site Reliability Engineering

Lead technical initiatives in AI, Trust, and Security for Google's Site Reliability Engineering organization, architecting next-generation cloud platforms.

Principal/Architect- Availability Engineering & SRE

Principal/Architect role leading Salesforce's SRE team, focusing on large-scale distributed systems and service reliability with 15+ years experience required.

Principal Database Site Reliability Engineer

Principal Database SRE role at Oracle Health, focusing on cloud infrastructure and healthcare applications transformation.

VP, Software Engineering, SRE

Lead Salesforce's SRE organization as VP, driving reliability innovation and cultural transformation while managing a global team of 100+ engineers.