Taro Logo

Senior Site Reliability Engineer

Microsoft builds the data platform for the age of AI, powering data-first applications and driving a data culture through Azure Data engineering team.
$108,100 - $199,700
Site Reliability
Senior Software Engineer
Hybrid
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS · Cloud
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Senior Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role is crucial in maintaining Microsoft's operational Database systems, focusing on developer-friendly, mission-critical, AI-enabled operational Databases. The position involves working with a globally distributed, massively scalable, multi-model cloud database service designed for planet-scale applications.

The ideal candidate will be responsible for building and optimizing solutions that analyze massive amounts of telemetry and service health indicators in near real-time, performing automated root cause analysis, and implementing necessary mitigations to maintain strict Service Level Objectives (SLOs). The role requires collaboration with engineering teams, customer interaction, and a data-driven approach to problem-solving.

Working in Vancouver with a hybrid work arrangement (up to 50% work from home), you'll be part of a team that operates like a startup while having the resources and impact of Microsoft. The position offers competitive compensation (CAD $108,100 - $199,700) and comprehensive benefits, including healthcare, educational resources, and parental leave.

This is an excellent opportunity for experienced engineers who are passionate about service reliability, automation, and working with large-scale distributed systems. The role combines technical expertise with customer interaction, making it ideal for those who enjoy both deep technical work and collaborative problem-solving.

Last updated 4 months ago

Responsibilities For Senior Site Reliability Engineer

  • Collaborate with engineering teams on building and enhancing tooling and automation solutions
  • Work with customers to understand pain points around Supportability and SLO attainment
  • Design and implement changes to service telemetry
  • Enhance customer facing experience through proactive alerting
  • Analyze data and provide operational insights to Design and Product teams

Requirements For Senior Site Reliability Engineer

Python
Java
  • 6+ years technical experience in software engineering, network engineering, or systems administration
  • Bachelor's/Master's Degree in Computer Science, Information Technology, or related field
  • Understanding of Observability and MELT implementation patterns
  • Experience in Logic Apps and Jupyter Notebooks
  • 5+ years of hands-on experience in Python/Java/C#
  • 3+ years of operational experience in improving Service Reliability
  • Systematic problem-solving approach with effective communication skills

Benefits For Senior Site Reliability Engineer

Medical Insurance
Education Budget
Parental Leave
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?