Replicant, the leader in Contact Center Automation, is seeking a Staff Site Reliability Engineer to scale our infrastructure and systems. We use AI to automate customer service calls and are now leveraging Large Language Models (LLMs) to transform the industry. Our tech stack includes TypeScript/NodeJS and Python within a Kubernetes environment on GCP, along with tools like Helm, Terraform, Datadog, and Prometheus.
As a Staff SRE, you'll:
- Execute long-term initiatives for smooth operation and high availability of production systems
- Identify and resolve performance bottlenecks
- Partner with engineering teams to improve reliability and scalability
- Participate in on-call rotation
- Coach senior SREs in infrastructure design
- Stay updated on industry best practices
Requirements:
- Experience managing complex, distributed systems in production
- Strong understanding of cloud platforms (GCP preferred) and Kubernetes
- Proficiency in scripting languages and automation tools
- Experience with monitoring systems (e.g., Datadog, Prometheus)
- Excellent problem-solving and communication skills
We offer:
- Remote work environment
- Competitive salaries, equity, and 401(k) for US employees
- Top-tier healthcare
- Health and Wellness Perk
- Equipment Stipend
- Flexible vacation policy
- Team trips and offsites
- 5-week sabbatical after 4.5 years
Our values:
- Blade Runners: Taking ownership and pride in achieving goals
- Bread Makers: Humble, egalitarian culture focused on teamwork
- Självdistans: Critical self-reflection and objectivity
Join us in transforming customer service with AI and make an impact in a rapidly growing company!