Site Reliability Engineer

Baseten is a growing team building ML infrastructure for enterprises and AI-native companies, backed by top-tier investors.
$150,000 - $250,000
DevOps
Senior Software Engineer
Hybrid
11 - 50 Employees
5+ years of experience

Description For Site Reliability Engineer

Baseten is seeking a Site Reliability Engineer to build and maintain robust systems ensuring scalable, reliable, and efficient infrastructure. As an SRE, you'll work on automating deployments, monitoring systems, optimizing performance, and managing incidents. You'll collaborate closely with users, learning from their ML operationalization challenges and improving the Baseten platform.

Key responsibilities include:

  • Building and maintaining scalable infrastructure
  • Working extensively with Kubernetes
  • Applying automation for CI/CD pipelines
  • Establishing standards for reliability and performance
  • Learning about ML infrastructure (prior experience not required)

The ideal candidate should:

  • Own products and projects end-to-end
  • Navigate ambiguity comfortably
  • Focus on customer problems and create simple, elegant solutions
  • Exercise good judgment on technical tradeoffs
  • Demonstrate pride, ownership, and accountability

Tech stack:

  • Backend: Go, Python, Postgres
  • Platform: Kubernetes, Go, Postgres, Redis, Kafka
  • Infrastructure: Gitops, Flux, Terraform, AWS/GCP

Baseten offers:

  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums)
  • Opportunity to grow in a rapidly expanding startup
  • Inclusive and supportive work culture
  • Exposure to various ML startups

Baseten is committed to fostering a diverse and inclusive workplace, providing equal employment opportunities to all employees and applicants.

Last updated 5 months ago

Responsibilities For Site Reliability Engineer

  • Build and maintain scalable infrastructure
  • Work extensively with Kubernetes
  • Apply automation for CI/CD pipelines
  • Establish standards for reliability and performance
  • Collaborate with users to improve the Baseten platform
  • Own products and projects end-to-end
  • Navigate ambiguity and focus on customer problems
  • Create simple, elegant solutions avoiding unnecessary complexity

Requirements For Site Reliability Engineer

Go
Python
Kubernetes
PostgreSQL
Redis
Kafka
  • Experience building and maintaining scalable infrastructure
  • Extensive experience with Kubernetes
  • Knowledge of when and how to apply automation (e.g., for CI/CD pipelines)
  • Experience establishing standards and best practices for reliability and performance
  • Willingness to learn about ML infrastructure

Benefits For Site Reliability Engineer

401k
Medical Insurance
Equity
  • Unlimited PTO
  • 401k
  • Covered healthcare premiums
  • Equity

Interested in this job?

Jobs Related To Baseten Site Reliability Engineer

Software Engineer - Infrastructure

Baseten is hiring a Senior Software Engineer for Infrastructure to build and maintain cloud infrastructure for ML-powered applications.

Senior Software Engineer, Developer Infrastructure

Senior Software Engineer position at Airbnb focusing on Developer Infrastructure and tooling, offering remote work and competitive compensation.

Software Engineer - DevOps

Senior DevOps Engineer role at BlueCat focusing on AWS, Terraform, and CI/CD pipelines in a hybrid work environment with strong company culture and benefits.

DevOps Engineer

DevOps Engineer position at Plane Software, focusing on cloud infrastructure, CI/CD, and system reliability for an open-source project management platform.

Senior Systems Development Engineer, Enterprise Fleet Integration and Management

Senior Systems Development Engineer role at Google, focusing on infrastructure automation and systems management for enterprise fleet integration.