ML Engineer Large-scale AI Infrastructure

GenBio

A Silicon Valley startup combining Generative AI with biology and medicine, pioneering pan-modal Large Biological Models (LBM) for healthcare transformation.

San Francisco, CA, USA

Machine Learning

Mid-Level Software Engineer

In-Person

2+ years of experience

AI · Healthcare · Biotech

Description For ML Engineer Large-scale AI Infrastructure

GenBio is a pioneering Silicon Valley startup at the intersection of Generative AI and biomedicine. With headquarters in Silicon Valley and a presence in Paris, we're revolutionizing healthcare through Large Biological Models (LBM). Our team of visionary scientists, engineers, and entrepreneurs is dedicated to decoding biology holistically and enabling next-generation life-transforming solutions.

As our ML Engineer for Large-scale AI Infrastructure, you'll be at the forefront of building and maintaining the computational backbone that powers our breakthrough research. You'll work with cutting-edge GPU clusters, implement distributed training systems, and optimize performance for our large-scale AI models. This role combines expertise in machine learning infrastructure with high-performance computing, requiring both technical depth and collaborative skills.

The ideal candidate will bring strong experience in GPU cluster management, distributed systems, and deep learning frameworks. You'll work alongside leading minds in AI and Biological Science, contributing to a mission that could fundamentally transform healthcare and biological research. This is an opportunity to join an exceptionally strong R&D team that's leading the charge in LLM and generative AI applications in biomedicine.

We offer a unique environment where innovation meets impact, and your work will directly contribute to advancing the future of biology and medicine through AI. Join us in our mission to pioneer new paradigms in healthcare, working with state-of-the-art technology and alongside world-class experts in both AI and biological sciences.

Last updated 3 months ago

Responsibilities For ML Engineer Large-scale AI Infrastructure

Design, deploy, and maintain high-performance GPU clusters
Implement distributed computing techniques for parallel training
Fine-tune GPU clusters and deep learning frameworks for optimal performance
Collaborate with data scientists and machine learning engineers
Ensure GPU clusters can scale effectively
Troubleshoot and resolve issues related to GPU clusters
Create and maintain documentation for GPU cluster configuration

Requirements For ML Engineer Large-scale AI Infrastructure

Python

Kubernetes

Master's or Ph.D. degree in computer science or related field with focus on High-Performance Computing, Distributed Systems, or Deep Learning
2+ years proven experience in managing GPU clusters
Strong expertise in distributed deep learning and parallel training techniques
Proficiency in PyTorch, Megatron-LM, DeepSpeed
Programming skills in Python and experience with GPU-accelerated libraries
Knowledge of performance profiling and optimization tools for HPC and deep learning
Familiarity with resource management and scheduling systems
Strong background in distributed systems, cloud computing, and containerization

GenBio

A Silicon Valley startup combining Generative AI with biology and medicine, pioneering pan-modal Large Biological Models (LBM) for healthcare transformation.

San Francisco, CA, USA

Machine Learning

Mid-Level Software Engineer

In-Person

2+ years of experience

AI · Healthcare · Biotech

Interested in this job?

Jobs Related To GenBio ML Engineer Large-scale AI Infrastructure

Deep Learning Engineer

GenBio

Deep Learning Engineer position focused on developing and deploying large-scale AI models for biological applications

Field Solution Architect II, AI Infrastructure, North, Google Cloud

Google

Enterprise AI Infrastructure Field Solution Architect position at Google Cloud, focusing on implementing AI/ML accelerators and cloud solutions for major clients.

Software Developer III, AI/ML GenAI

Google

Software Developer III position at Google focusing on AI/ML and GenAI development, requiring 2 years of experience and expertise in machine learning infrastructure and generative AI concepts.

Product Manager, Assurance Evaluations, Google Cloud

Google

Lead product management for Google Cloud's AI Assurance Evaluations, focusing on responsible AI development, safety, and governance while ensuring efficient and ethical AI solutions.

Research Scientist, Google Cloud AI

Google

Research Scientist position at Google Cloud AI team, focusing on advancing AI technology and its applications across various industries.