Infrastructure Engineer

A platform for creating, deploying, and running machine learning models, making AI accessible to every software developer.
$200,000 - $280,000
Cloud
Mid-Level Software Engineer
In-Person
3+ years of experience
AI

Description For Infrastructure Engineer

Replicate is revolutionizing AI accessibility by building the premier platform for creating, deploying, and running machine learning models. As an Infrastructure Engineer on the Platform team, you'll be at the forefront of making generative AI available to developers worldwide.

The role involves managing the complete lifecycle of ML models, from packaging and deployment to serving, scaling, and monitoring. You'll be working with a platform that supports thousands of models and handles millions of daily predictions. This position offers a unique opportunity to build innovative solutions where your decisions have direct impact.

The technical stack includes Python, Go, Node.js, Kubernetes, Terraform, and databases like Redis, Google BigQuery, and PostgreSQL. You'll be working on critical infrastructure components, including multi-regional traffic management, GPU optimization, and sophisticated task allocation systems.

The ideal candidate brings experience in platform development at scale, understanding of complex systems architecture, and proven ability with Kubernetes operations. While ML/AI production experience is a plus, the role focuses on infrastructure rather than model building. Strong communication skills are essential as you'll be collaborating closely with teams and translating complex concepts into actionable insights.

Based in Replicate's Mission district office in San Francisco, this role offers the chance to be part of building a strong in-person culture while working on cutting-edge AI infrastructure. You'll be joining a team dedicated to democratizing AI technology and making it accessible to developers everywhere.

Last updated 3 months ago

Responsibilities For Infrastructure Engineer

  • Designing and building deployment and model-serving platform
  • Building technology to operate ML and AI advancements
  • Designing systems to maximize utilization and reliability of Kubernetes clusters and GPUs
  • Owning and optimizing task allocation and queuing across customers
  • Working on model inference optimization through caching, weights management, and runtime optimizations

Requirements For Infrastructure Engineer

Python
Go
Node.js
Kubernetes
Redis
PostgreSQL
  • Experience building platforms at scale
  • Experience with complex systems
  • Experience designing and implementing developer-friendly APIs
  • Hands-on experience with Kubernetes
  • Strong communication and collaboration skills
  • At least 3 years of full time software engineering experience

Interested in this job?

Jobs Related To Replicate Infrastructure Engineer

Software Engineer - ML Platform

Build and optimize ML platform infrastructure at Replicate, making AI accessible to developers worldwide.

Commissioning Engineer, Amazon Commissioning team

AWS seeks Commissioning Engineer in Tokyo to oversee data center infrastructure, manage system commissioning, and ensure operational excellence.

Cloud Security Engineer

Cloud Security Engineer position at Alarm.com focusing on securing cloud environments across AWS, GCP, and Azure using modern security tools and best practices.

Cloud Software Engineer

Cloud Software Engineer role at Graphcore developing Kubernetes device plugins for AI accelerator hardware integration

Infrastructure Engineer

Infrastructure Engineer position at Zirous, focusing on cloud services and infrastructure support with hybrid work model in West Des Moines, IA.