Deep Learning Engineer - Distributed Task-Based Backends

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins that transform industries.

Santa Clara, CA, USA

$148,000 - $287,500

Machine Learning

Staff Software Engineer

Remote

5,000+ Employees

5+ years of experience

Description For Deep Learning Engineer - Distributed Task-Based Backends

NVIDIA is seeking a Senior to Principal level Deep Learning Engineer to revolutionize distributed backends for major frameworks like PyTorch, JAX, and TensorFlow. This role combines advanced software engineering with cutting-edge AI infrastructure development, focusing on scaling model architectures across thousands of GPUs using task-based runtime systems like Legate, Legion & Realm.

The position offers a unique opportunity to work at the intersection of high-performance computing and artificial intelligence, developing solutions that will shape the future of distributed AI computing. You'll be responsible for creating framework extensions, optimizing compiler performance, and developing debugging tools for large-scale AI models.

The ideal candidate brings strong expertise in parallel computing, distributed systems, and deep learning frameworks, with proficiency in Python and C++. This role requires both technical depth in GPU computing and the ability to collaborate with enterprise customers and internal teams across NVIDIA.

Working at NVIDIA means joining the world leader in accelerated computing, where you'll contribute to transformative technologies that impact major industries globally. The company offers competitive compensation including a base salary range of $148,000-$287,500 USD, equity, and comprehensive benefits.

This position provides the flexibility of remote work while being part of a team that's pushing the boundaries of what's possible in AI and high-performance computing. It's an excellent opportunity for someone passionate about solving complex technical challenges and making a significant impact on the future of AI infrastructure.

Last updated 3 hours ago

Responsibilities For Deep Learning Engineer - Distributed Task-Based Backends

Develop extensions to popular Deep Learning frameworks for parallelization strategies
Develop compiler optimizations and parallelization heuristics
Develop tools for performance debugging of AI models at large scales
Study and tune Deep Learning training workloads at large scale
Support enterprise customers and partners in scaling models
Collaborate with Deep Learning software and hardware teams
Contribute to runtime systems development for distributed GPU computing

Requirements For Deep Learning Engineer - Distributed Task-Based Backends

Python

BS, MS or PhD degree in Computer Science, Electrical Engineering or related field
5+ years of relevant industry experience or equivalent academic experience after BS
Proficient with Python and C++ programming
Strong background with parallel and distributed programming, preferably on GPUs
Hands-on development skills using Machine Learning frameworks
Understanding of Deep Learning training in distributed contexts

Benefits For Deep Learning Engineer - Distributed Task-Based Backends

Equity

Equity
Benefits package available at nvidia.com/benefits

NVIDIA

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins that transform industries.

Santa Clara, CA, USA

$148,000 - $287,500

Machine Learning

Staff Software Engineer

Remote

5,000+ Employees

5+ years of experience

Interested in this job?

Jobs Related To NVIDIA Deep Learning Engineer - Distributed Task-Based Backends

Senior Computer Architect - Deep Learning

NVIDIA

Senior Computer Architect position at NVIDIA focusing on deep learning architecture design for next-generation GPUs and AI acceleration.

Machine Learning Software Platform Architect

NVIDIA

Senior ML Platform Architect role at NVIDIA focusing on developing LLM infrastructure for chip design applications

Machine Learning Software Platform Architect

NVIDIA

Senior ML Platform Architect role at NVIDIA focusing on LLM infrastructure for chip design

Manager, Deep Learning Algorithms

NVIDIA

Lead engineering activities for Deep Learning models at NVIDIA, managing teams and projects in AI and accelerated computing.

AI Lead Software Engineer, AI & Analytics

Recorded Future

Lead Software Engineer position focusing on AI and Analytics, developing intelligence solutions for Recorded Future's Analytics Team within R&D organization.