Principal Engineer, Distributed Machine Learning

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.

San Francisco, CA, USA

$272,000 - $425,500

Machine Learning

Principal Software Engineer

In-Person

12+ years of experience

AI · Enterprise SaaS

Description For Principal Engineer, Distributed Machine Learning

NVIDIA, the world leader in accelerated computing, is seeking a Principal Engineer to join their Distributed Machine Learning team. This role focuses on GPU-accelerated Apache Spark and distributed machine learning solutions, combining cutting-edge technology with practical business applications.

The position offers an opportunity to work on significant challenges in distributed ML/DL, making these technologies more accessible and efficient. You'll be at the forefront of developing GPU-accelerated distributed machine learning solutions, working with open-source communities, and improving existing frameworks like XGBoost, RAPIDS cuML, PyTorch, and TensorFlow.

As a Principal Engineer, you'll lead the design and development of new APIs and libraries, optimize performance for distributed training and inference, and contribute to major open-source projects. The role requires extensive experience in distributed systems, machine learning, and software development, with particular emphasis on technologies like Apache Spark, Kubernetes, and GPU computing.

The position offers a competitive compensation package, including a substantial base salary range of $272,000 - $425,500, plus equity. You'll be working with some of the most talented professionals in the technology industry, in an environment that values creativity and autonomy.

NVIDIA's commitment to fostering a diverse work environment and their position at the forefront of AI and accelerated computing make this an exceptional opportunity for a seasoned professional looking to make a significant impact in the field of distributed machine learning.

Last updated 3 months ago

Responsibilities For Principal Engineer, Distributed Machine Learning

Design and develop new user-friendly APIs and libraries for GPU-enabled Spark clusters
Design and develop GPU accelerated ML libraries for distributed training and inference
Demonstrate superior performance on industry standard benchmarks
Make technical contributions to open source projects
Work with partners and customers on deploying distributed ML algorithms
Keep up with published advances in distributed ML systems
Provide technical mentorship to a team of engineers

Requirements For Principal Engineer, Distributed Machine Learning

Python

Kubernetes

Java

Scala

BS, MS, or PhD in Computer Science, Computer Engineering, or related field
12+ years of work or research experience in software development
5+ experience as technical lead in distributed machine learning and/or deep learning
3+ years of open source development experience
3+ years hands-on experience with Spark MLlib, XGBoost, and/or PyTorch
Knowledge of internals of Apache Spark MLlib
Experience with Kubernetes, YARN, Spark, and/or Ray for distributed ML orchestration
Proven technical skills in distributed systems
Excellent programming skills in C++, Scala, and Python
Familiar with agile software development practice

Benefits For Principal Engineer, Distributed Machine Learning

Equity

Equity

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins.

San Francisco, CA, USA

$272,000 - $425,500

Machine Learning

Principal Software Engineer

In-Person

12+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To NVIDIA Principal Engineer, Distributed Machine Learning

Principal DGX Cloud Machine Learning Architect

NVIDIA

Principal ML Architect role at NVIDIA focusing on optimizing generative AI models for DGX Cloud, requiring 15+ years of experience and offering competitive compensation.

Principal Engineer for AI Software Resiliency

NVIDIA

Lead AI software resiliency development for world's most powerful AI supercomputers at NVIDIA

Distinguished Engineer, AI Resiliency Lead

NVIDIA

Lead AI Resiliency engineering role at NVIDIA, focusing on developing resilient software features for large-scale AI model training with competitive compensation.

Senior Product Architect, HPC and AI

NVIDIA

Senior Product Architect position at NVIDIA focusing on HPC and AI infrastructure design, offering competitive compensation and opportunity to shape the future of AI technology.

Senior Deep Learning Performance Architect

NVIDIA

Senior Deep Learning Performance Architect role at NVIDIA focusing on developing next-generation AI architectures and optimizing deep learning performance.