Deep Learning Engineer - Distributed Task-Based Backends

NVIDIA is the world leader in accelerated computing, pioneering solutions in AI and digital twins that transform industries.
$148,000 - $287,500
Machine Learning
Staff Software Engineer
Remote
5,000+ Employees
5+ years of experience
AI

Description For Deep Learning Engineer - Distributed Task-Based Backends

NVIDIA is seeking a Senior to Principal level Deep Learning Engineer to revolutionize distributed backends for major frameworks like PyTorch, JAX, and TensorFlow. This role combines advanced software engineering with cutting-edge AI infrastructure development, focusing on scaling model architectures across thousands of GPUs using task-based runtime systems like Legate, Legion & Realm.

The position offers a unique opportunity to work at the intersection of high-performance computing and artificial intelligence, developing solutions that will shape the future of distributed AI computing. You'll be responsible for creating framework extensions, optimizing compiler performance, and developing debugging tools for large-scale AI models.

The ideal candidate brings strong expertise in parallel computing, distributed systems, and deep learning frameworks, with proficiency in Python and C++. This role requires both technical depth in GPU computing and the ability to collaborate with enterprise customers and internal teams across NVIDIA.

Working at NVIDIA means joining the world leader in accelerated computing, where you'll contribute to transformative technologies that impact major industries globally. The company offers competitive compensation including a base salary range of $148,000-$287,500 USD, equity, and comprehensive benefits.

This position provides the flexibility of remote work while being part of a team that's pushing the boundaries of what's possible in AI and high-performance computing. It's an excellent opportunity for someone passionate about solving complex technical challenges and making a significant impact on the future of AI infrastructure.

Last updated 3 hours ago

Responsibilities For Deep Learning Engineer - Distributed Task-Based Backends

  • Develop extensions to popular Deep Learning frameworks for parallelization strategies
  • Develop compiler optimizations and parallelization heuristics
  • Develop tools for performance debugging of AI models at large scales
  • Study and tune Deep Learning training workloads at large scale
  • Support enterprise customers and partners in scaling models
  • Collaborate with Deep Learning software and hardware teams
  • Contribute to runtime systems development for distributed GPU computing

Requirements For Deep Learning Engineer - Distributed Task-Based Backends

Python
  • BS, MS or PhD degree in Computer Science, Electrical Engineering or related field
  • 5+ years of relevant industry experience or equivalent academic experience after BS
  • Proficient with Python and C++ programming
  • Strong background with parallel and distributed programming, preferably on GPUs
  • Hands-on development skills using Machine Learning frameworks
  • Understanding of Deep Learning training in distributed contexts

Benefits For Deep Learning Engineer - Distributed Task-Based Backends

Equity
  • Equity
  • Benefits package available at nvidia.com/benefits

Interested in this job?

Jobs Related To NVIDIA Deep Learning Engineer - Distributed Task-Based Backends

Senior Computer Architect - Deep Learning

Senior Computer Architect position at NVIDIA focusing on deep learning architecture design for next-generation GPUs and AI acceleration.

Machine Learning Software Platform Architect

Senior ML Platform Architect role at NVIDIA focusing on developing LLM infrastructure for chip design applications

Machine Learning Software Platform Architect

Senior ML Platform Architect role at NVIDIA focusing on LLM infrastructure for chip design

Manager, Deep Learning Algorithms

Lead engineering activities for Deep Learning models at NVIDIA, managing teams and projects in AI and accelerated computing.

AI Lead Software Engineer, AI & Analytics

Lead Software Engineer position focusing on AI and Analytics, developing intelligence solutions for Recorded Future's Analytics Team within R&D organization.