Senior Reliability Engineer

Microsoft empowers every person and organization on the planet to achieve more, leading global technology company.
Site Reliability
Staff Software Engineer
In-Person
8+ years of experience
Enterprise SaaS · Cloud

Description For Senior Reliability Engineer

Microsoft's Azure Data team is seeking a Senior Reliability Engineer to join their world-class engineering team within the Azure SQL Database group. This role focuses on ensuring the reliability and serviceability of Azure SQL Database and Managed Instance services. You'll be working with cutting-edge database technologies that support massive scale deployments across the globe.

As a Site Reliability Engineer, you'll be responsible for maintaining service reliability and availability, designing automated issue resolution systems, and collaborating with feature teams to implement reliable and serviceable features. The position offers unique challenges in distributed systems, handling databases that support hundreds of terabytes of data and thousands of users.

The role combines deep technical expertise in SQL Server/Azure SQL Database with strong troubleshooting skills. You'll be working on problems spanning the entire database technology stack - from connectivity and high availability to query processing and transaction management. This is an opportunity to impact a critical service used by organizations worldwide while working alongside some of the industry's best engineers.

The position requires strong debugging skills, extensive experience with SQL Server internals, and the ability to work effectively across teams. You'll be part of Microsoft's C+AI Azure organization, working in an agile environment to solve complex distributed systems challenges. The role offers excellent growth opportunities and the chance to influence the future of cloud database services.

Benefits include industry-leading healthcare, educational resources, parental leave, and investment opportunities. This is an ideal position for someone passionate about large-scale systems, problem-solving, and making a significant impact on cloud infrastructure.

Last updated 13 days ago

Responsibilities For Senior Reliability Engineer

  • Act as subject matter expert for Azure Database/Managed Instance services
  • Identify opportunities and implement automation to resolve live-site incidents
  • Design and implement solutions to improve service health and reliability
  • Own, triage, investigate, and resolve service issues
  • Author and maintain functional and technical documentation
  • Interact with customers for escalated support issues
  • Mentor, develop and deliver training
  • Ability to meet on-call responsibilities periodically

Requirements For Senior Reliability Engineer

Python
Java
  • 8+ years of software development or SQL product support experience
  • 3+ years of using debugging tools such as Windbg, Visual Studio and Xperf
  • Deep understanding of Windows Operating System level concepts
  • Proficient programming skills using managed code such as C#/Java
  • BS/MS in Computer Science, Engineering, or equivalent industry experience
  • Experience in working with multiple teams and coordinating large projects
  • Deep understanding of SQL Server/Azure SQL Database
  • Demonstrated influence outside their immediate team

Benefits For Senior Reliability Engineer

Medical Insurance
Parental Leave
Education Budget
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Interested in this job?

Jobs Related To Microsoft Senior Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer role at Microsoft working on Azure Cosmos DB, focusing on service reliability, automation, and system optimization with hybrid work options.

Senior Manager - Storage Production Engineering and SRE

Lead Storage Production Engineering and SRE team at NVIDIA, managing cloud-scale storage solutions and ensuring reliability of GPU cloud services.

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Staff Site Reliability Engineer

Staff Site Reliability Engineer role at Zscaler, developing infrastructure and tools for the world's largest cloud security platform.