We are seeking an experienced software professional to contribute to the design and development of accelerated and distributed implementations of Python APIs for numerical computing. Python has become the de-facto programming language for AI, data science, and HPC, through frameworks like NumPy, SciPy, TensorFlow, and PyTorch. NVIDIA has been at the forefront of providing GPU-accelerated implementations of these frameworks' fundamental components.
Join our dynamic team to develop and optimize GPU-accelerated and distributed implementations of Python numerical libraries, supporting Python-based frameworks in various ecosystems. You'll be a crucial member of a team working to unlock the power of distributed GPU computing for scientific computing, data analytics, deep learning, and professional graphics, running on hardware ranging from supercomputers to the cloud.
Responsibilities:
- Work closely with product management and partners to understand use cases and requirements
- Architect, prioritize, and develop accelerated and distributed implementations of numerical algorithms
- Design future-proof Python APIs for accelerated numerical/scientific computing libraries
- Analyze and improve performance on various CPU and GPU architectures
- Prototype integrations of developed APIs into targeted frameworks
- Write effective, maintainable, and well-tested code for production use
- Contribute to the development of runtime systems for multi-GPU computing
Requirements:
- BS, MS or PhD in Computer Science, Applied Math, Electrical Engineering or related field
- 5+ years of relevant industry experience or equivalent academic experience
- Excellent Python, C++, and CUDA programming skills
- Strong understanding of fundamental numerical methods, dense and sparse array computing
- Deep familiarity with Python numerical computing libraries and accelerated implementations
- Experience developing and publishing Python libraries
- Strong background in parallel programming and performance analysis
Preferred Qualifications:
- Experience with data science, machine learning, and deep learning libraries
- Experience with low-level GPU performance optimization
- Background in distributed applications, tasking or asynchronous runtimes
- Knowledge of compiler optimization techniques and domain-specific language design
Join NVIDIA to be at the forefront of accelerated computing and contribute to transforming the world's largest industries through AI and digital twins.