Taro Logo

Principal Supercomputing Software Engineer

Microsoft is a global technology company that empowers every person and organization on the planet to achieve more.
United States
$137,600 - $267,000
Principal Software Engineer
Remote
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Job Description

Microsoft Azure AI/HPC team is seeking a Principal Supercomputing Software Engineer to enable customers in deploying, monitoring, profiling, and debugging applications on hyperscale cloud infrastructure. This role is critical in building and maintaining cloud-native supercomputers, requiring specialized tools and techniques to ensure system reliability and performance.

The position involves working with state-of-the-art tools, establishing best practices, driving architectural changes, and influencing the roadmap of software and hardware components. You'll be directly impacting Azure's supercomputing capabilities, which have already made marks on Top500, MLPerf, and Graph500 rankings.

As a Principal Engineer, you'll be responsible for maintaining system reliability, runtime performance, and health while meeting customer SLAs. The role requires deep expertise in HPC systems, cloud infrastructure, and software development. You'll work closely with customers, vendors, and internal teams to drive comprehensive solutions for operating world-class supercomputers in the public cloud.

This is an excellent opportunity for someone passionate about large-scale computing systems and cloud technology. The position offers competitive compensation ($137,600 - $267,000), comprehensive benefits, and the chance to work on cutting-edge technology that powers some of the world's largest supercomputing deployments. Working at Microsoft, you'll be part of a culture that emphasizes growth mindset, innovation, and collaboration to achieve shared goals.

Last updated 3 days ago

Responsibilities For Principal Supercomputing Software Engineer

  • Be part of a comprehensive systems management team focused on operational excellence and customer success
  • Analyze key system metrics and telemetry to proactively identify and debug HPC system issues
  • Build appropriate tooling, help develop processes and ensure solutions are responsive to emerging user needs
  • Partner with customers, vendors, and other teams within Azure
  • Ensure that the Azure platform is performant, scalable and resilient
  • Foster test-driven engineering culture

Requirements For Principal Supercomputing Software Engineer

Python
Java
JavaScript
Linux
  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience
  • 5+ years of experience in operating AI/HPC systems
  • 3+ years of specialized experience with AI/HPC system management OR High-Speed Networks OR HPC Storage OR managing Cloud Infrastructure
  • Experience with coding in C, C++, C#, Java, JavaScript, or Python
  • Must pass Microsoft Cloud Background Check

Benefits For Principal Supercomputing Software Engineer

Medical Insurance
Education Budget
Parental Leave
Mental Health Assistance
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect