Site Reliability Engineer

AI-driven social platform serving millions of global users
Site Reliability
Mid-Level Software Engineer
In-Person
3+ years of experience
AI

Description For Site Reliability Engineer

PalUp is revolutionizing social interactions through their AI-driven platform that serves millions of users globally. As a Site Reliability Engineer, you'll be at the heart of their engineering team, ensuring the platform's stability, reliability, and efficiency.

The role demands a skilled engineer with 3+ years of SRE/DevOps experience who excels in cloud services (particularly GCP), Linux administration, and container orchestration with Kubernetes. You'll be working with cutting-edge technologies including Python, Golang, and modern monitoring tools like Grafana and Prometheus.

Your responsibilities will span from designing and implementing monitoring systems to optimizing CI/CD pipelines and managing cloud-based deployments. You'll be crucial in analyzing and improving system performance, ensuring high availability, and developing automation tools to streamline operations.

The ideal candidate values automation, proactive problem-solving, and collaborative teamwork. You'll thrive in PalUp's dynamic environment where innovation and technical excellence are paramount. The company emphasizes creating scalable solutions and empowering teams to deliver world-class experiences.

This is an excellent opportunity for a mid-level engineer passionate about site reliability and DevOps to make a significant impact in a growing AI-focused company. You'll work alongside talented engineers who value collaboration, fairness, and mutual respect, while helping shape the future of AI-driven social interactions.

Last updated 2 days ago

Responsibilities For Site Reliability Engineer

  • Design, implement, and maintain monitoring and alerting systems to ensure service stability
  • Maintain and optimize CI/CD pipelines to improve deployment efficiency and reliability
  • Manage and improve cloud-based deployment processes using Docker, Kubernetes, and related tools
  • Analyze system bottlenecks and proactively implement architectural and performance optimizations
  • Collaborate with development teams to ensure high availability and fault tolerance of applications and databases
  • Develop scripts and automation tools (e.g., Python, Shell scripts) to streamline operational tasks

Requirements For Site Reliability Engineer

Python
Go
Kubernetes
Linux
PostgreSQL
MongoDB
MySQL
  • 3+ years of experience in SRE/DevOps or related roles
  • Strong expertise in cloud services and infrastructure (GCP preferred, AWS or Azure is a plus)
  • Solid knowledge of Linux system administration and maintenance
  • Proficiency in programming languages such as Python or Golang
  • Hands-on experience with monitoring and alerting systems (Grafana, Prometheus)
  • Advanced knowledge of Kubernetes and containerization tools like Docker
  • Familiarity with log management systems and operational configurations
  • Strong English reading and communication skills for technical documentation

Interested in this job?

Jobs Related To PalUp Site Reliability Engineer

Site Reliability Engineer - CTJ - TS/SCI

Microsoft Site Reliability Engineer position supporting Azure Local and DHI for US Government customers, requiring TS/SCI clearance and cloud infrastructure expertise.

Site Reliability Engineer II - CTJ - Poly

Site Reliability Engineer II position at Microsoft focusing on managing and automating large-scale Commerce platform within Azure and Office ecosystems.

Site Reliability Engineer (SRE)

Remote Site Reliability Engineer position at Lucidya, focusing on cloud infrastructure, Kubernetes, and automation with 3 years of experience required.

Site Reliability Engineer (SRE)

Remote Site Reliability Engineer position at Lucidya, focusing on cloud infrastructure, Kubernetes, and automation with 3 years of experience required.

Site Reliability Engineer

Site Reliability Engineer role at commercetools focusing on multi-cloud infrastructure, Kubernetes, and automation with hybrid work model.