Site Reliability Engineer

ML infrastructure company backed by top-tier investors, providing platform for ML teams at enterprises and AI-native companies to power production workloads.
$150,000 - $250,000
Site Reliability
Senior Software Engineer
Hybrid
3+ years of experience
AI · Enterprise SaaS

Description For Site Reliability Engineer

Baseten is a venture-backed ML infrastructure company that has achieved product-market fit and secured Series B funding. We serve ML teams at enterprises and AI-native companies like Descript, Bland.ai, Patreon, Writer, and Robust Intelligence, providing them with a platform for their production workloads that offers superior performance, security, and reliability.

As a Site Reliability Engineer at Baseten, you'll be at the forefront of building and maintaining the infrastructure that powers machine learning operations. Your role will involve creating robust systems and processes to ensure scalability, reliability, and efficiency. You'll work on everything from automation of deployments to monitoring systems and performance optimization.

The position requires a strong background in infrastructure and systems engineering, with particular expertise in Kubernetes and modern DevOps practices. You'll be expected to own projects end-to-end, collaborate across teams, and contribute to establishing best practices for reliability and performance.

What makes this role unique is the opportunity to work in the rapidly evolving field of ML infrastructure while being part of a growing startup backed by prestigious investors like IVP, Spark Capital, and Greylock. You'll gain exposure to various ML startups and have the chance to shape the future of AI infrastructure.

The company offers competitive compensation, including a salary range of $150K-$250K plus equity, comprehensive benefits, and a culture that promotes learning and growth. If you're passionate about building scalable systems and want to be part of transforming how companies deploy and manage ML models, this role offers an exciting opportunity to make a significant impact.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

  • Build and maintain scalable infrastructure to support ML model deployment and operation
  • Establish standards and best practices for reliability and performance
  • Automate processes for managing CI/CD pipelines
  • Own products and projects end-to-end
  • Collaborate with cross-functional teams
  • Mentor junior team members
  • Navigate ambiguity and exercise good judgment on tradeoffs
  • Demonstrate pride, ownership, and accountability

Requirements For Site Reliability Engineer

Kubernetes
  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field
  • 3+ years of work professional work experience in a fast-paced, high-growth environment
  • Extensive experience with Kubernetes
  • Experience in building and maintaining scalable infrastructure
  • Experience with infrastructure-as-code tools and CI/CD tooling
  • Relevant OSS observability experience is a plus
  • Ability to own projects end-to-end
  • Open to learning about machine learning

Benefits For Site Reliability Engineer

401k
Medical Insurance
  • Competitive compensation package
  • Unlimited PTO
  • 401k
  • Covered healthcare premiums
  • Opportunity to be part of a rapidly growing startup
  • Inclusive and supportive work culture
  • Exposure to various ML startups

Interested in this job?

Jobs Related To Baseten Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer role at AION, building and maintaining infrastructure for a decentralized AI cloud platform with focus on automation and reliability.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior Software Developer role in Site Reliability Engineering at Google Cloud, focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and comprehensive benefits.

Senior Software Engineer, SRE, Cloud Incident Response

Senior SRE position at Google focusing on Cloud Incident Response, requiring expertise in distributed systems and incident management.

Senior Software Engineer, Site Reliability Engineering

Senior Site Reliability Engineering role at Google, focusing on building and maintaining large-scale distributed systems for Google Cloud services.