NVIDIA is at the forefront of the generative AI revolution. The Inference Benchmarking (IB) team focuses on advanced inference server performance for Large Language Models (LLMs). As a Deep Learning Architect for LLM Inference, you'll be responsible for characterizing the latest LLMs and inference servers, collaborating with performance marketing teams, working with AI startup engineers, profiling GPU kernel-level performance, developing analysis tools, contributing to deep learning software projects, verifying TRT-LLM performance, and guiding the direction of inference serving across the company.
Key responsibilities include:
- Characterizing LLMs and inference servers like vLLM and DeepSpeed-MII
- Creating content to highlight TRT-LLM achievements
- Collaborating with AI startup engineers
- Profiling GPU performance and identifying optimization opportunities
- Developing profiling and analysis software tools
- Contributing to projects like PyTorch, vLLM, and LLMPerf
- Verifying TRT-LLM performance for new GPU product launches
- Collaborating across teams to ensure world-class performance
Requirements:
- Master's or PhD in Computer Science, Electrical Engineering, or related fields
- Knowledge of deep learning inference serving, PyTorch, and compiler optimizations
- Proficiency in C++ and Python, familiarity with CUDA
- Experience with LLMs and their performance challenges
- Understanding of CPU and GPU microarchitecture
- Experience with complex software projects
Preferred qualifications:
- Drive to improve software and hardware performance
- History of developing workplace efficiency tools
- Experience with database and visualization tools like D3.js
NVIDIA offers a competitive base salary range of $104,000 - $189,750 USD, along with equity and comprehensive benefits. Join a team of highly skilled professionals in one of the technology world's most desirable employers.