Senior Site Reliability Engineer

ClickUp

ClickUp is the world's only all-in-one productivity platform that flexes to the way people want to work, replacing individual workplace productivity tools with a single, unified platform.

San Diego, CA, USA

Site Reliability

Senior Software Engineer

Remote

1,000 - 5,000 Employees

4+ years of experience

Enterprise SaaS

This job posting may no longer be active. You may be interested in these related jobs instead:

Site Reliability Engineer (SRE)

ClickUp

Senior Site Reliability Engineer position at ClickUp, focusing on maintaining and improving cloud infrastructure reliability and performance for a leading productivity platform.

Site Reliability Engineer L4/L5 - Live Cloud Platform SRE

Netflix

Senior Site Reliability Engineer position at Netflix focusing on cloud platform reliability for live streaming events, offering competitive compensation and comprehensive benefits.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Google

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and scalability.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Google

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability, automation, and infrastructure development.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Google

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Description For Senior Site Reliability Engineer

ClickUp is revolutionizing workplace productivity as the world's only all-in-one platform that adapts to how people want to work. We're seeking driven and innovative software engineers with a strong site reliability engineering (SRE) background to join our team. As an SRE at ClickUp, you'll play a crucial role in enhancing the stability, availability, and reliability of our globally distributed, cloud-based infrastructure that serves thousands of users daily.

Your responsibilities will include:

Designing and building high-performance, reliable, and scalable systems
Collaborating with engineering teams on product design and troubleshooting
Improving overall site reliability, including uptime, stability, and observability
Managing and enhancing our monitoring infrastructure
Implementing and refining our site reliability practices
Responding to and preventing downtime events
Contributing ideas to our technology and algorithms

We're looking for candidates with:

4-6+ years of experience with Amazon Web Services
Expertise in Kubernetes, DevOps, and SRE best practices
Experience with IaC, CI/CD, containerization, and monitoring tools
Strong knowledge of network security and database management
Excellent problem-solving and communication skills

Join ClickUp, one of the fastest-growing SaaS companies worldwide, and help millions of users boost their productivity. We offer a culture of hard work, consistent growth, and a desire to break norms. We value ambition, merit, and a willingness to succeed, regardless of background or personal characteristics.

ClickUp is an Equal Opportunity Employer committed to creating an inclusive environment for all employees. If you're passionate about improving the way people work and ready to tackle complex challenges, we encourage you to apply!

Last updated 2 months ago

Responsibilities For Senior Site Reliability Engineer

Participate in designing and building systems for maximum performance, reliability, and scalability
Work with the engineering teams on product design, decisions, and troubleshooting
Increase general stability, observability, and metrics surrounding both uptime and stability
Champion our monitoring infrastructure
Implement and improve our general site reliability posture
Respond to and troubleshoot downtime events while actively developing safeguards to prevent them
Participate in brainstorming sessions with the engineering team and contribute ideas to our technology and algorithms

Requirements For Senior Site Reliability Engineer

Kubernetes

PostgreSQL

Node.js

Linux

4-6+ years of knowledge of the Amazon Web Services ecosystem
Experience working with Kubernetes
Experience in managing production-critical infrastructures and DevOps mindset
Familiar with SRE best practices and procedures
Experience with IaC (CDK, Terraform), CI/CD (GitHub Actions, ArgoCD)
Familiar with Containerisation (Docker)
Knowledgeable in network, firewall, and security best practices
Experience with self-healing automation and monitoring tools (DataDog, CloudWatch)
Knowledge of relational databases, preferably PostgreSQL (not mandatory)
A strong self-starter, operationally-focused; a problem-solver
Excellent interpersonal, written, and oral communication skills
Experience with application security testing is a plus (not mandatory)
Familiarity or experience with Node.js is a plus (not mandatory)
Experience with management of Linux-based EC2 instances

ClickUp

ClickUp is the world's only all-in-one productivity platform that flexes to the way people want to work, replacing individual workplace productivity tools with a single, unified platform.

San Diego, CA, USA

Site Reliability

Senior Software Engineer

Remote

1,000 - 5,000 Employees

4+ years of experience

Enterprise SaaS

Interested in this job?