Senior ML Infrastructure Engineer

Kuzco

Building a distributed LLM inference network combining idle GPU capacity worldwide for running large-language models.

San Francisco, CA, USA

$180,000 - $250,000

Machine Learning

Senior Software Engineer

In-Person

11 - 50 Employees

5+ years of experience

AI · Enterprise SaaS

Description For Senior ML Infrastructure Engineer

Kuzco is seeking a Senior ML Infrastructure Engineer to join their innovative team in San Francisco. The company is building a groundbreaking distributed LLM inference network that harnesses idle GPU capacity globally, managing over 5,000 GPUs and hundreds of terabytes of VRAM.

The role focuses on developing large-scale, fault-tolerant systems handling millions of LLM inference requests daily. You'll work at the intersection of distributed systems, machine learning, and resource optimization, designing and implementing core systems that power their globally distributed network.

The team consists of experienced staff-level engineers who have founded and run their own software companies. They value creativity, technical excellence, and humility, working in a high-agency, collaborative environment. The company offers competitive compensation ($180,000-$250,000), equity, and comprehensive benefits.

This position is perfect for someone with strong distributed systems experience, expertise in languages like TypeScript, Python, Go, or Rust, and a passion for ML infrastructure. You'll be working on cutting-edge technology that shapes the future of AI infrastructure, making this an exceptional opportunity for growth and impact in the AI industry.

The in-person work environment in downtown San Francisco provides direct collaboration with a dedicated team that's deeply passionate about their work. If you're excited about building next-generation ML systems at scale and want to be part of a well-funded, fast-growing startup, this role offers the perfect blend of challenge and opportunity.

Last updated a month ago

Responsibilities For Senior ML Infrastructure Engineer

Design and implement scalable distributed systems for inference network
Develop models for efficient resource allocation across heterogeneous hardware
Optimize network latency, throughput, and availability
Build robust logging and metrics systems
Conduct reviews of architecture and system design
Collaborate with founders and stakeholders to improve infrastructure

Requirements For Senior ML Infrastructure Engineer

Python

TypeScript

Rust

Kubernetes

Very strong problem-solving skills
5+ years of experience in building high performance systems
Strong programming skills in Typescript, Python, and one of Go, Rust, or C++
Solid understanding of distributed systems concepts
Knowledge of orchestrators and schedulers like Kubernetes and Nomad
Use of AI tooling in development workflow
Experience with LLM inference engines is a plus
Experience with GPU programming and optimization

Benefits For Senior ML Infrastructure Engineer

Medical Insurance

Equity

Competitive compensation
Equity in high-growth startup
Comprehensive benefits

Kuzco

Building a distributed LLM inference network combining idle GPU capacity worldwide for running large-language models.

San Francisco, CA, USA

$180,000 - $250,000

Machine Learning

Senior Software Engineer

In-Person

11 - 50 Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Kuzco Senior ML Infrastructure Engineer

Senior ML Infrastructure Engineer

Kuzco

Senior ML Infrastructure Engineer position at Kuzco in San Francisco, focusing on building and maintaining machine learning infrastructure systems.

Machine Learning Engineer (Auto Labeling)

42dot

Senior Machine Learning Engineer position at 42dot focusing on developing auto-labeling systems for autonomous driving technology, requiring 5+ years of experience and advanced ML expertise.

Senior AI/ML Research Engineer (GenAI)

Chan Zuckerberg Initiative

Senior AI/ML Research Engineer position at Chan Zuckerberg Initiative focusing on developing advanced machine learning models for biomedical research.

Research Engineer

Atla

Research Engineer position at Atla developing language models as evaluators and constructing safety guardrails for LLMs

Senior Machine Learning Engineer

Findly

Senior Machine Learning Engineer position at Findly, developing AI-powered business intelligence solutions with Python and ML technologies.