Principal Engineer, Distributed Machine Learning

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.
$272,000 - $425,500
Machine Learning
Principal Software Engineer
In-Person
12+ years of experience
AI · Enterprise SaaS

Description For Principal Engineer, Distributed Machine Learning

NVIDIA, the world leader in accelerated computing, is seeking a Principal Engineer to join their Distributed Machine Learning team. This role focuses on GPU-accelerated Apache Spark and distributed machine learning solutions, combining cutting-edge technology with practical business applications.

The position offers an opportunity to work on significant challenges in distributed ML/DL, making these technologies more accessible and efficient. You'll be at the forefront of developing GPU-accelerated distributed machine learning solutions, working with open-source communities, and improving existing frameworks like XGBoost, RAPIDS cuML, PyTorch, and TensorFlow.

As a Principal Engineer, you'll lead the design and development of new APIs and libraries, optimize performance for distributed training and inference, and contribute to major open-source projects. The role requires extensive experience in distributed systems, machine learning, and software development, with particular emphasis on technologies like Apache Spark, Kubernetes, and GPU computing.

The position offers a competitive compensation package, including a substantial base salary range of $272,000 - $425,500, plus equity. You'll be working with some of the most talented professionals in the technology industry, in an environment that values creativity and autonomy.

NVIDIA's commitment to fostering a diverse work environment and their position at the forefront of AI and accelerated computing make this an exceptional opportunity for a seasoned professional looking to make a significant impact in the field of distributed machine learning.

Last updated 3 months ago

Responsibilities For Principal Engineer, Distributed Machine Learning

  • Design and develop new user-friendly APIs and libraries for GPU-enabled Spark clusters
  • Design and develop GPU accelerated ML libraries for distributed training and inference
  • Demonstrate superior performance on industry standard benchmarks
  • Make technical contributions to open source projects
  • Work with partners and customers on deploying distributed ML algorithms
  • Keep up with published advances in distributed ML systems
  • Provide technical mentorship to a team of engineers

Requirements For Principal Engineer, Distributed Machine Learning

Python
Kubernetes
Java
Scala
  • BS, MS, or PhD in Computer Science, Computer Engineering, or related field
  • 12+ years of work or research experience in software development
  • 5+ experience as technical lead in distributed machine learning and/or deep learning
  • 3+ years of open source development experience
  • 3+ years hands-on experience with Spark MLlib, XGBoost, and/or PyTorch
  • Knowledge of internals of Apache Spark MLlib
  • Experience with Kubernetes, YARN, Spark, and/or Ray for distributed ML orchestration
  • Proven technical skills in distributed systems
  • Excellent programming skills in C++, Scala, and Python
  • Familiar with agile software development practice

Benefits For Principal Engineer, Distributed Machine Learning

Equity
  • Equity

Interested in this job?

Jobs Related To NVIDIA Principal Engineer, Distributed Machine Learning

Principal DGX Cloud Machine Learning Architect

Principal ML Architect role at NVIDIA focusing on optimizing generative AI models for DGX Cloud, requiring 15+ years of experience and offering competitive compensation.

Principal Engineer for AI Software Resiliency

Lead AI software resiliency development for world's most powerful AI supercomputers at NVIDIA

Distinguished Engineer, AI Resiliency Lead

Lead AI Resiliency engineering role at NVIDIA, focusing on developing resilient software features for large-scale AI model training with competitive compensation.

Senior Product Architect, HPC and AI

Senior Product Architect position at NVIDIA focusing on HPC and AI infrastructure design, offering competitive compensation and opportunity to shape the future of AI technology.

Senior Deep Learning Performance Architect

Senior Deep Learning Performance Architect role at NVIDIA focusing on developing next-generation AI architectures and optimizing deep learning performance.