Senior Site Reliability Engineer

Microsoft

Microsoft builds the data platform for the age of AI, powering data-first applications and driving a data culture through Azure Data engineering team.

Vancouver, BC, Canada

$108,100 - $199,700

Site Reliability

Senior Software Engineer

Hybrid

5,000+ Employees

6+ years of experience

AI · Enterprise SaaS · Cloud

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Senior Site Reliability Engineer

Microsoft's Azure Data engineering team is seeking a Senior Site Reliability Engineer to join their databases team, specifically working on Azure Cosmos DB. This role is crucial in maintaining Microsoft's operational Database systems, focusing on developer-friendly, mission-critical, AI-enabled operational Databases. The position involves working with a globally distributed, massively scalable, multi-model cloud database service designed for planet-scale applications.

The ideal candidate will be responsible for building and optimizing solutions that analyze massive amounts of telemetry and service health indicators in near real-time, performing automated root cause analysis, and implementing necessary mitigations to maintain strict Service Level Objectives (SLOs). The role requires collaboration with engineering teams, customer interaction, and a data-driven approach to problem-solving.

Working in Vancouver with a hybrid work arrangement (up to 50% work from home), you'll be part of a team that operates like a startup while having the resources and impact of Microsoft. The position offers competitive compensation (CAD $108,100 - $199,700) and comprehensive benefits, including healthcare, educational resources, and parental leave.

This is an excellent opportunity for experienced engineers who are passionate about service reliability, automation, and working with large-scale distributed systems. The role combines technical expertise with customer interaction, making it ideal for those who enjoy both deep technical work and collaborative problem-solving.

Last updated 4 months ago

Responsibilities For Senior Site Reliability Engineer

Collaborate with engineering teams on building and enhancing tooling and automation solutions
Work with customers to understand pain points around Supportability and SLO attainment
Design and implement changes to service telemetry
Enhance customer facing experience through proactive alerting
Analyze data and provide operational insights to Design and Product teams

Requirements For Senior Site Reliability Engineer

Python

Java

6+ years technical experience in software engineering, network engineering, or systems administration
Bachelor's/Master's Degree in Computer Science, Information Technology, or related field
Understanding of Observability and MELT implementation patterns
Experience in Logic Apps and Jupyter Notebooks
5+ years of hands-on experience in Python/Java/C#
3+ years of operational experience in improving Service Reliability
Systematic problem-solving approach with effective communication skills

Benefits For Senior Site Reliability Engineer

Medical Insurance

Education Budget

Parental Leave

Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect