Site Reliability Engineer

AI-driven social platform serving millions of global users
Site Reliability
Mid-Level Software Engineer
In-Person
3+ years of experience
AI

Description For Site Reliability Engineer

PalUp is revolutionizing social interactions through their AI-driven platform that serves millions of users globally. As a Site Reliability Engineer, you'll be at the heart of their engineering team, ensuring the platform's stability, reliability, and efficiency.

The role demands a skilled engineer with 3+ years of SRE/DevOps experience who excels in cloud services (particularly GCP), Linux administration, and container orchestration with Kubernetes. You'll be working with cutting-edge technologies including Python, Golang, and modern monitoring tools like Grafana and Prometheus.

Your responsibilities will span from designing and implementing monitoring systems to optimizing CI/CD pipelines and managing cloud-based deployments. You'll be crucial in analyzing and improving system performance, ensuring high availability, and developing automation tools to streamline operations.

The ideal candidate values automation, proactive problem-solving, and collaborative teamwork. You'll thrive in PalUp's dynamic environment where innovation and technical excellence are paramount. The company emphasizes creating scalable solutions and empowering teams to deliver world-class experiences.

This is an excellent opportunity for a mid-level engineer passionate about site reliability and DevOps to make a significant impact in a growing AI-focused company. You'll work alongside talented engineers who value collaboration, fairness, and mutual respect, while helping shape the future of AI-driven social interactions.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

  • Design, implement, and maintain monitoring and alerting systems to ensure service stability
  • Maintain and optimize CI/CD pipelines to improve deployment efficiency and reliability
  • Manage and improve cloud-based deployment processes using Docker, Kubernetes, and related tools
  • Analyze system bottlenecks and proactively implement architectural and performance optimizations
  • Collaborate with development teams to ensure high availability and fault tolerance of applications and databases
  • Develop scripts and automation tools (e.g., Python, Shell scripts) to streamline operational tasks

Requirements For Site Reliability Engineer

Python
Go
Kubernetes
Linux
PostgreSQL
MongoDB
MySQL
  • 3+ years of experience in SRE/DevOps or related roles
  • Strong expertise in cloud services and infrastructure (GCP preferred, AWS or Azure is a plus)
  • Solid knowledge of Linux system administration and maintenance
  • Proficiency in programming languages such as Python or Golang
  • Hands-on experience with monitoring and alerting systems (Grafana, Prometheus)
  • Advanced knowledge of Kubernetes and containerization tools like Docker
  • Familiarity with log management systems and operational configurations
  • Strong English reading and communication skills for technical documentation

Interested in this job?

Jobs Related To PalUp Site Reliability Engineer

Site Reliability Engineer II

Microsoft seeks Site Reliability Engineer II for security team, offering hybrid work, competitive pay, and comprehensive benefits. 4+ years experience required.

Site Reliability Engineer II

Microsoft is seeking a Site Reliability Engineer II to join their Secure Admin Services team, focusing on cybersecurity solutions and system reliability.

Software Developer III, Site Reliability Development, Google Cloud

Site Reliability Developer role at Google Cloud focusing on building and maintaining large-scale distributed systems with competitive compensation and benefits.

Software Engineer III, Shopping Build Site Reliability Engineer

Site Reliability Engineer role at Google focusing on Shopping Build infrastructure, requiring distributed systems expertise and 2+ years of software development experience.

Software Engineer III, Google Cloud, Site Reliability Engineering

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with opportunities for technical growth and impact.