Infrastructure Engineer

Replicate

A platform for creating, deploying, and running machine learning models, making AI accessible to every software developer.

San Francisco, CA, USA

$200,000 - $280,000

Cloud

Mid-Level Software Engineer

In-Person

3+ years of experience

This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Infrastructure Engineer

Replicate is revolutionizing AI accessibility by building the premier platform for creating, deploying, and running machine learning models. As an Infrastructure Engineer on the Platform team, you'll be at the forefront of making generative AI available to developers worldwide.

The role involves managing the complete lifecycle of ML models, from packaging and deployment to serving, scaling, and monitoring. You'll be working with a platform that supports thousands of models and handles millions of daily predictions. This position offers a unique opportunity to build innovative solutions where your decisions have direct impact.

The technical stack includes Python, Go, Node.js, Kubernetes, Terraform, and databases like Redis, Google BigQuery, and PostgreSQL. You'll be working on critical infrastructure components, including multi-regional traffic management, GPU optimization, and sophisticated task allocation systems.

The ideal candidate brings experience in platform development at scale, understanding of complex systems architecture, and proven ability with Kubernetes operations. While ML/AI production experience is a plus, the role focuses on infrastructure rather than model building. Strong communication skills are essential as you'll be collaborating closely with teams and translating complex concepts into actionable insights.

Based in Replicate's Mission district office in San Francisco, this role offers the chance to be part of building a strong in-person culture while working on cutting-edge AI infrastructure. You'll be joining a team dedicated to democratizing AI technology and making it accessible to developers everywhere.

Last updated 7 months ago

Responsibilities For Infrastructure Engineer

Designing and building deployment and model-serving platform
Building technology to operate ML and AI advancements
Designing systems to maximize utilization and reliability of Kubernetes clusters and GPUs
Owning and optimizing task allocation and queuing across customers
Working on model inference optimization through caching, weights management, and runtime optimizations

Requirements For Infrastructure Engineer

Python

Node.js

Kubernetes

Redis

PostgreSQL

Experience building platforms at scale
Experience with complex systems
Experience designing and implementing developer-friendly APIs
Hands-on experience with Kubernetes
Strong communication and collaboration skills
At least 3 years of full time software engineering experience