ML Engineer Large-scale AI Infrastructure

A Silicon Valley startup combining Generative AI with biology and medicine, pioneering pan-modal Large Biological Models (LBM) for healthcare transformation.
Machine Learning
Mid-Level Software Engineer
In-Person
2+ years of experience
AI · Healthcare · Biotech

Description For ML Engineer Large-scale AI Infrastructure

GenBio is a pioneering Silicon Valley startup at the intersection of Generative AI and biomedicine. With headquarters in Silicon Valley and a presence in Paris, we're revolutionizing healthcare through Large Biological Models (LBM). Our team of visionary scientists, engineers, and entrepreneurs is dedicated to decoding biology holistically and enabling next-generation life-transforming solutions.

As our ML Engineer for Large-scale AI Infrastructure, you'll be at the forefront of building and maintaining the computational backbone that powers our breakthrough research. You'll work with cutting-edge GPU clusters, implement distributed training systems, and optimize performance for our large-scale AI models. This role combines expertise in machine learning infrastructure with high-performance computing, requiring both technical depth and collaborative skills.

The ideal candidate will bring strong experience in GPU cluster management, distributed systems, and deep learning frameworks. You'll work alongside leading minds in AI and Biological Science, contributing to a mission that could fundamentally transform healthcare and biological research. This is an opportunity to join an exceptionally strong R&D team that's leading the charge in LLM and generative AI applications in biomedicine.

We offer a unique environment where innovation meets impact, and your work will directly contribute to advancing the future of biology and medicine through AI. Join us in our mission to pioneer new paradigms in healthcare, working with state-of-the-art technology and alongside world-class experts in both AI and biological sciences.

Last updated 14 days ago

Responsibilities For ML Engineer Large-scale AI Infrastructure

  • Design, deploy, and maintain high-performance GPU clusters
  • Implement distributed computing techniques for parallel training
  • Fine-tune GPU clusters and deep learning frameworks for optimal performance
  • Collaborate with data scientists and machine learning engineers
  • Ensure GPU clusters can scale effectively
  • Troubleshoot and resolve issues related to GPU clusters
  • Create and maintain documentation for GPU cluster configuration

Requirements For ML Engineer Large-scale AI Infrastructure

Python
Kubernetes
  • Master's or Ph.D. degree in computer science or related field with focus on High-Performance Computing, Distributed Systems, or Deep Learning
  • 2+ years proven experience in managing GPU clusters
  • Strong expertise in distributed deep learning and parallel training techniques
  • Proficiency in PyTorch, Megatron-LM, DeepSpeed
  • Programming skills in Python and experience with GPU-accelerated libraries
  • Knowledge of performance profiling and optimization tools for HPC and deep learning
  • Familiarity with resource management and scheduling systems
  • Strong background in distributed systems, cloud computing, and containerization

Interested in this job?

Jobs Related To GenBio ML Engineer Large-scale AI Infrastructure

Deep Learning Engineer

Deep Learning Engineer position focused on developing and deploying large-scale AI models for biological applications

Software Engineer 2- AI Full Stack Development

Mid-level Software Engineer position at Microsoft focusing on AI and full-stack development, requiring 4+ years of experience and expertise in AI/ML technologies.

Research Product Manager, Google Cloud, Domain Applied ML

Lead AI/ML research product management at Google Cloud, driving innovation in Generative AI and machine learning infrastructure.

Research Scientist, Market Algorithms

Research Scientist position at Google focusing on market algorithms, combining ML, economics, and computer science research with practical applications.

Research Scientist

Research Scientist position at Google Research focusing on machine learning and AI systems development, requiring PhD and programming expertise.