Site Reliability Engineer

Baseten

ML infrastructure company backed by top-tier investors, providing platform for ML teams at enterprises and AI-native companies to power production workloads.

San Francisco, CA, USA • New York, NY, USA

$150,000 - $250,000

Site Reliability

Senior Software Engineer

Hybrid

3+ years of experience

AI · Enterprise SaaS

Description For Site Reliability Engineer

Baseten is a venture-backed ML infrastructure company that has achieved product-market fit and secured Series B funding. We serve ML teams at enterprises and AI-native companies like Descript, Bland.ai, Patreon, Writer, and Robust Intelligence, providing them with a platform for their production workloads that offers superior performance, security, and reliability.

As a Site Reliability Engineer at Baseten, you'll be at the forefront of building and maintaining the infrastructure that powers machine learning operations. Your role will involve creating robust systems and processes to ensure scalability, reliability, and efficiency. You'll work on everything from automation of deployments to monitoring systems and performance optimization.

The position requires a strong background in infrastructure and systems engineering, with particular expertise in Kubernetes and modern DevOps practices. You'll be expected to own projects end-to-end, collaborate across teams, and contribute to establishing best practices for reliability and performance.

What makes this role unique is the opportunity to work in the rapidly evolving field of ML infrastructure while being part of a growing startup backed by prestigious investors like IVP, Spark Capital, and Greylock. You'll gain exposure to various ML startups and have the chance to shape the future of AI infrastructure.

The company offers competitive compensation, including a salary range of $150K-$250K plus equity, comprehensive benefits, and a culture that promotes learning and growth. If you're passionate about building scalable systems and want to be part of transforming how companies deploy and manage ML models, this role offers an exciting opportunity to make a significant impact.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

Build and maintain scalable infrastructure to support ML model deployment and operation
Establish standards and best practices for reliability and performance
Automate processes for managing CI/CD pipelines
Own products and projects end-to-end
Collaborate with cross-functional teams
Mentor junior team members
Navigate ambiguity and exercise good judgment on tradeoffs
Demonstrate pride, ownership, and accountability

Requirements For Site Reliability Engineer

Kubernetes

Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field
3+ years of work professional work experience in a fast-paced, high-growth environment
Extensive experience with Kubernetes
Experience in building and maintaining scalable infrastructure
Experience with infrastructure-as-code tools and CI/CD tooling
Relevant OSS observability experience is a plus
Ability to own projects end-to-end
Open to learning about machine learning

Benefits For Site Reliability Engineer

401k

Medical Insurance

Competitive compensation package
Unlimited PTO
401k
Covered healthcare premiums
Opportunity to be part of a rapidly growing startup
Inclusive and supportive work culture
Exposure to various ML startups

Baseten

ML infrastructure company backed by top-tier investors, providing platform for ML teams at enterprises and AI-native companies to power production workloads.

San Francisco, CA, USA • New York, NY, USA

$150,000 - $250,000

Site Reliability

Senior Software Engineer

Hybrid

3+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Baseten Site Reliability Engineer

Site Reliability Engineer

AION

Senior Site Reliability Engineer role at AION, building and maintaining infrastructure for a decentralized AI cloud platform with focus on automation and reliability.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Google

Senior Software Developer role in Site Reliability Engineering at Google Cloud, focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Google

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and comprehensive benefits.

Senior Software Engineer, SRE, Cloud Incident Response

Google

Senior SRE position at Google focusing on Cloud Incident Response, requiring expertise in distributed systems and incident management.

Senior Software Engineer, Site Reliability Engineering

Google

Senior Site Reliability Engineering role at Google, focusing on building and maintaining large-scale distributed systems for Google Cloud services.