Senior ML Infrastructure Engineer

Building a distributed LLM inference network combining idle GPU capacity worldwide for running large-language models.
$180,000 - $250,000
Machine Learning
Senior Software Engineer
In-Person
11 - 50 Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior ML Infrastructure Engineer

Kuzco is seeking a Senior ML Infrastructure Engineer to join their innovative team in San Francisco. The company is building a groundbreaking distributed LLM inference network that harnesses idle GPU capacity globally, managing over 5,000 GPUs and hundreds of terabytes of VRAM.

The role focuses on developing large-scale, fault-tolerant systems handling millions of LLM inference requests daily. You'll work at the intersection of distributed systems, machine learning, and resource optimization, designing and implementing core systems that power their globally distributed network.

The team consists of experienced staff-level engineers who have founded and run their own software companies. They value creativity, technical excellence, and humility, working in a high-agency, collaborative environment. The company offers competitive compensation ($180,000-$250,000), equity, and comprehensive benefits.

This position is perfect for someone with strong distributed systems experience, expertise in languages like TypeScript, Python, Go, or Rust, and a passion for ML infrastructure. You'll be working on cutting-edge technology that shapes the future of AI infrastructure, making this an exceptional opportunity for growth and impact in the AI industry.

The in-person work environment in downtown San Francisco provides direct collaboration with a dedicated team that's deeply passionate about their work. If you're excited about building next-generation ML systems at scale and want to be part of a well-funded, fast-growing startup, this role offers the perfect blend of challenge and opportunity.

Last updated a month ago

Responsibilities For Senior ML Infrastructure Engineer

  • Design and implement scalable distributed systems for inference network
  • Develop models for efficient resource allocation across heterogeneous hardware
  • Optimize network latency, throughput, and availability
  • Build robust logging and metrics systems
  • Conduct reviews of architecture and system design
  • Collaborate with founders and stakeholders to improve infrastructure

Requirements For Senior ML Infrastructure Engineer

Python
TypeScript
Go
Rust
Kubernetes
  • Very strong problem-solving skills
  • 5+ years of experience in building high performance systems
  • Strong programming skills in Typescript, Python, and one of Go, Rust, or C++
  • Solid understanding of distributed systems concepts
  • Knowledge of orchestrators and schedulers like Kubernetes and Nomad
  • Use of AI tooling in development workflow
  • Experience with LLM inference engines is a plus
  • Experience with GPU programming and optimization

Benefits For Senior ML Infrastructure Engineer

Medical Insurance
Equity
  • Competitive compensation
  • Equity in high-growth startup
  • Comprehensive benefits

Interested in this job?

Jobs Related To Kuzco Senior ML Infrastructure Engineer

Senior ML Infrastructure Engineer

Senior ML Infrastructure Engineer position at Kuzco in San Francisco, focusing on building and maintaining machine learning infrastructure systems.

Machine Learning Engineer (Auto Labeling)

Senior Machine Learning Engineer position at 42dot focusing on developing auto-labeling systems for autonomous driving technology, requiring 5+ years of experience and advanced ML expertise.

Senior AI/ML Research Engineer (GenAI)

Senior AI/ML Research Engineer position at Chan Zuckerberg Initiative focusing on developing advanced machine learning models for biomedical research.

Research Engineer

Research Engineer position at Atla developing language models as evaluators and constructing safety guardrails for LLMs

Senior Machine Learning Engineer

Senior Machine Learning Engineer position at Findly, developing AI-powered business intelligence solutions with Python and ML technologies.