Microsoft's Azure Compute team is seeking a Principal Software Engineer to join their Availability Platform team. This role focuses on ensuring Azure VM availability with a 99.99%+ SLA through innovative solutions and data-driven decisions. The team owns services measuring the health of millions of Azure machines and controls repair decisions.
As a Principal Software Engineer, you'll work on pushing the boundaries of scale, reliability, availability, and efficiency in cloud computing. The role involves comprehensive designs, incremental development, and frequent shipping while adapting to customer feedback. You'll be part of building fault-tolerant distributed systems on datacenter hardware, creating an illusion of limitless and always-available resources.
The position offers the opportunity to work with talented engineers, collaborate with data scientists on predictive failure models, and drive critical platform improvements. You'll lead architecture decisions, mentor team members, and handle complex distributed systems challenges at massive scale.
Key responsibilities include partnering with stakeholders across teams, leading service design and architecture, developing high-quality code, and supporting live operations. The role requires strong technical expertise in distributed systems, proven leadership skills, and the ability to make sound decisions in ambiguous situations.
Microsoft offers comprehensive benefits including industry-leading healthcare, educational resources, savings plans, and generous time off. The position supports hybrid work with up to 100% work from home options and minimal travel requirements (0-25%).
This is an excellent opportunity for experienced engineers passionate about distributed systems, cloud infrastructure, and technical leadership to make a significant impact on one of Azure's most critical platforms.