Senior Site Reliability Engineer

ClickUp is the world's only all-in-one productivity platform that flexes to the way people want to work, replacing individual workplace productivity tools with a single, unified platform.
Site Reliability
Senior Software Engineer
Remote
1,000 - 5,000 Employees
4+ years of experience
Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Site Reliability Engineer (SRE)

Senior Site Reliability Engineer position at ClickUp, focusing on maintaining and improving cloud infrastructure reliability and performance for a leading productivity platform.

Site Reliability Engineer L4/L5 - Live Cloud Platform SRE

Senior Site Reliability Engineer position at Netflix focusing on cloud platform reliability for live streaming events, offering competitive compensation and comprehensive benefits.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and scalability.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability, automation, and infrastructure development.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Description For Senior Site Reliability Engineer

ClickUp is revolutionizing workplace productivity as the world's only all-in-one platform that adapts to how people want to work. We're seeking driven and innovative software engineers with a strong site reliability engineering (SRE) background to join our team. As an SRE at ClickUp, you'll play a crucial role in enhancing the stability, availability, and reliability of our globally distributed, cloud-based infrastructure that serves thousands of users daily.

Your responsibilities will include:

  • Designing and building high-performance, reliable, and scalable systems
  • Collaborating with engineering teams on product design and troubleshooting
  • Improving overall site reliability, including uptime, stability, and observability
  • Managing and enhancing our monitoring infrastructure
  • Implementing and refining our site reliability practices
  • Responding to and preventing downtime events
  • Contributing ideas to our technology and algorithms

We're looking for candidates with:

  • 4-6+ years of experience with Amazon Web Services
  • Expertise in Kubernetes, DevOps, and SRE best practices
  • Experience with IaC, CI/CD, containerization, and monitoring tools
  • Strong knowledge of network security and database management
  • Excellent problem-solving and communication skills

Join ClickUp, one of the fastest-growing SaaS companies worldwide, and help millions of users boost their productivity. We offer a culture of hard work, consistent growth, and a desire to break norms. We value ambition, merit, and a willingness to succeed, regardless of background or personal characteristics.

ClickUp is an Equal Opportunity Employer committed to creating an inclusive environment for all employees. If you're passionate about improving the way people work and ready to tackle complex challenges, we encourage you to apply!

Last updated 2 months ago

Responsibilities For Senior Site Reliability Engineer

  • Participate in designing and building systems for maximum performance, reliability, and scalability
  • Work with the engineering teams on product design, decisions, and troubleshooting
  • Increase general stability, observability, and metrics surrounding both uptime and stability
  • Champion our monitoring infrastructure
  • Implement and improve our general site reliability posture
  • Respond to and troubleshoot downtime events while actively developing safeguards to prevent them
  • Participate in brainstorming sessions with the engineering team and contribute ideas to our technology and algorithms

Requirements For Senior Site Reliability Engineer

Kubernetes
PostgreSQL
Node.js
Linux
  • 4-6+ years of knowledge of the Amazon Web Services ecosystem
  • Experience working with Kubernetes
  • Experience in managing production-critical infrastructures and DevOps mindset
  • Familiar with SRE best practices and procedures
  • Experience with IaC (CDK, Terraform), CI/CD (GitHub Actions, ArgoCD)
  • Familiar with Containerisation (Docker)
  • Knowledgeable in network, firewall, and security best practices
  • Experience with self-healing automation and monitoring tools (DataDog, CloudWatch)
  • Knowledge of relational databases, preferably PostgreSQL (not mandatory)
  • A strong self-starter, operationally-focused; a problem-solver
  • Excellent interpersonal, written, and oral communication skills
  • Experience with application security testing is a plus (not mandatory)
  • Familiarity or experience with Node.js is a plus (not mandatory)
  • Experience with management of Linux-based EC2 instances

Interested in this job?