Staff Site Reliability Engineer

Replicant

Leader in Contact Center Automation, helping companies automate common customer service calls and empowering agents to focus on complex challenges.

Site Reliability

Staff Software Engineer

Remote

AI · Enterprise SaaS

This job posting may no longer be active. You may be interested in these related jobs instead:

Staff Software Engineer, Reliability Engineering

Airbnb

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Sr Staff Software Engineer, Reliability Engineering

Airbnb

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Technical Program Manager III, Site Reliability, Storage

Google

Technical Program Manager III position at Google, leading Storage Site Reliability Engineering initiatives and cross-functional programs.

Software Engineering Manager II, Site Reliability Engineering

Google

Lead Google's Site Reliability Engineering team, managing distributed systems and ensuring service reliability while driving technical innovation and team development.

Software Engineering Manager II, Site Reliability Engineering

Google

Lead Site Reliability Engineering team at Google, managing distributed systems and ensuring service reliability while providing technical leadership and team development.

Description For Staff Site Reliability Engineer

Replicant, the leader in Contact Center Automation, is seeking a Staff Site Reliability Engineer to scale our infrastructure and systems. We use AI to automate customer service calls and are now leveraging Large Language Models (LLMs) to transform the industry. Our tech stack includes TypeScript/NodeJS and Python within a Kubernetes environment on GCP, along with tools like Helm, Terraform, Datadog, and Prometheus.

As a Staff SRE, you'll:

Execute long-term initiatives for smooth operation and high availability of production systems
Identify and resolve performance bottlenecks
Partner with engineering teams to improve reliability and scalability
Participate in on-call rotation
Coach senior SREs in infrastructure design
Stay updated on industry best practices

Requirements:

Experience managing complex, distributed systems in production
Strong understanding of cloud platforms (GCP preferred) and Kubernetes
Proficiency in scripting languages and automation tools
Experience with monitoring systems (e.g., Datadog, Prometheus)
Excellent problem-solving and communication skills

We offer:

Remote work environment
Competitive salaries, equity, and 401(k) for US employees
Top-tier healthcare
Health and Wellness Perk
Equipment Stipend
Flexible vacation policy
Team trips and offsites
5-week sabbatical after 4.5 years

Our values:

Blade Runners: Taking ownership and pride in achieving goals
Bread Makers: Humble, egalitarian culture focused on teamwork
Självdistans: Critical self-reflection and objectivity

Join us in transforming customer service with AI and make an impact in a rapidly growing company!

Last updated 2 months ago

Responsibilities For Staff Site Reliability Engineer

Execute long-term initiatives for smooth operation and high availability of production systems
Identify and resolve performance bottlenecks
Partner with engineering teams to improve reliability and scalability
Participate in on-call rotation
Coach senior SREs in infrastructure design
Stay updated on industry best practices

Requirements For Staff Site Reliability Engineer

TypeScript

Node.js

Python

Kubernetes

Experience managing complex, distributed systems in production
Strong understanding of cloud platforms (GCP preferred) and Kubernetes
Proficiency in scripting languages and automation tools
Experience with monitoring systems (e.g., Datadog, Prometheus)
Excellent problem-solving and communication skills

Benefits For Staff Site Reliability Engineer

401k

Medical Insurance

Dental Insurance

Vision Insurance

Equity

Remote working environment
Competitive salaries
Equity
401(k) plan for US Employees
Top-tier healthcare (medical, vision, and dental)
Health and Wellness Perk
Equipment Stipend
Flexible vacation policy
Team trips and offsites
5-week sabbatical after 4.5 years

Replicant

Leader in Contact Center Automation, helping companies automate common customer service calls and empowering agents to focus on complex challenges.

Site Reliability

Staff Software Engineer

Remote

AI · Enterprise SaaS

Interested in this job?