Site Reliability Engineer

Baseten is a growing team building ML infrastructure for enterprises and AI-native companies, backed by top-tier investors.
$150,000 - $250,000
DevOps
Senior Software Engineer
Hybrid
11 - 50 Employees
5+ years of experience

Description For Site Reliability Engineer

Baseten is seeking a Site Reliability Engineer to build and maintain robust systems ensuring scalable, reliable, and efficient infrastructure. As an SRE, you'll work on automating deployments, monitoring systems, optimizing performance, and managing incidents. You'll collaborate closely with users, learning from their ML operationalization challenges and improving the Baseten platform.

Key responsibilities include:

  • Building and maintaining scalable infrastructure
  • Working extensively with Kubernetes
  • Applying automation for CI/CD pipelines
  • Establishing standards for reliability and performance
  • Learning about ML infrastructure (prior experience not required)

The ideal candidate should:

  • Own products and projects end-to-end
  • Navigate ambiguity comfortably
  • Focus on customer problems and create simple, elegant solutions
  • Exercise good judgment on technical tradeoffs
  • Demonstrate pride, ownership, and accountability

Tech stack:

  • Backend: Go, Python, Postgres
  • Platform: Kubernetes, Go, Postgres, Redis, Kafka
  • Infrastructure: Gitops, Flux, Terraform, AWS/GCP

Baseten offers:

  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums)
  • Opportunity to grow in a rapidly expanding startup
  • Inclusive and supportive work culture
  • Exposure to various ML startups

Baseten is committed to fostering a diverse and inclusive workplace, providing equal employment opportunities to all employees and applicants.

Last updated 6 months ago

Responsibilities For Site Reliability Engineer

  • Build and maintain scalable infrastructure
  • Work extensively with Kubernetes
  • Apply automation for CI/CD pipelines
  • Establish standards for reliability and performance
  • Collaborate with users to improve the Baseten platform
  • Own products and projects end-to-end
  • Navigate ambiguity and focus on customer problems
  • Create simple, elegant solutions avoiding unnecessary complexity

Requirements For Site Reliability Engineer

Go
Python
Kubernetes
PostgreSQL
Redis
Kafka
  • Experience building and maintaining scalable infrastructure
  • Extensive experience with Kubernetes
  • Knowledge of when and how to apply automation (e.g., for CI/CD pipelines)
  • Experience establishing standards and best practices for reliability and performance
  • Willingness to learn about ML infrastructure

Benefits For Site Reliability Engineer

401k
Medical Insurance
Equity
  • Unlimited PTO
  • 401k
  • Covered healthcare premiums
  • Equity

Interested in this job?

Jobs Related To Baseten Site Reliability Engineer

Software Engineer - Infrastructure

Baseten is hiring a Senior Software Engineer for Infrastructure to build and maintain cloud infrastructure for ML-powered applications.

DevOps / Test Engineer

Senior DevOps Engineer position at Apptronik, focusing on building and maintaining data infrastructure for advanced humanoid robotics.

Sr. IT Network Engineer

Senior IT Network Engineer position at SpaceX focusing on wireless network design, deployment, and management for space technology development.

Senior Software Engineer, Infrastructure

Senior Infrastructure Engineer role at Standard Bots, building scalable systems for robot deployment and operations, offering competitive salary and benefits.

Sr. Supplier Development Engineer, MHE Supplier Development

Senior Supplier Development Engineer role at Amazon focusing on Material Handling Equipment (MHE) supplier technology and automation solutions.