Site Reliability Engineer

A global technology company developing real-time communication platforms and services across 230+ countries.
Site Reliability
Senior Software Engineer
Hybrid
5+ years of experience
Enterprise SaaS

Description For Site Reliability Engineer

Hyperconnect's Platform Department is seeking a Site Reliability Engineer to join their team that provides infrastructure and common platform technology across all services including Azar and new products. The SRE team's mission is to ensure all services developed at Hyperconnect remain stable, allowing users to enjoy special experiences without interruption. Working with AWS, Kubernetes, and Service mesh, you'll manage modern computing and network infrastructure across all services and systems. The role goes beyond simple infrastructure management, allowing deep contribution to backend engineering. Given the real-time nature of the business, you'll work on high-performance, low-latency systems. You'll gain experience managing large-scale infrastructure in a global environment, handling multi-products, and working with both B2B and B2C environments. The team uses cutting-edge tools like Terraform, Helm, ArgoCD, and Spinnaker for infrastructure management, and implements comprehensive monitoring solutions using Zabbix, Prometheus, OpenTelemetry, and Elasticsearch. You'll be part of a team that values automation, continuous improvement, and proactive problem-solving, while working in a collaborative environment that spans multiple technical teams and stakeholders.

Last updated 3 days ago

Responsibilities For Site Reliability Engineer

  • Build and operate high-availability system infrastructure in AWS cloud environment
  • Implement and manage system/application logging, monitoring, and automation using tools like Zabbix and Prometheus
  • Lead incident response and postmortem culture
  • Identify and optimize service improvements based on SLO/SLI metrics
  • Conduct PoCs for new technologies and implement them in production
  • Manage and improve monitoring systems using OpenTelemetry and Elasticsearch
  • Support 300+ microservices with application monitoring

Requirements For Site Reliability Engineer

Kubernetes
Go
Python
Linux
  • Strong understanding of CS fundamentals, especially Linux and Networking
  • Understanding of container technologies
  • Programming ability in Python, Golang
  • Practical experience with Linux servers in public cloud environments (AWS)
  • Excellent communication skills and documentation ability
  • Ability to identify and proactively solve various service issues
  • Enthusiasm for learning new technologies

Interested in this job?

Jobs Related To Hyperconnect Site Reliability Engineer

Senior Software Engineer - Site Reliability Engineering

Senior SRE position at Roblox focusing on building resilient systems, automation tools, and monitoring solutions for a gaming platform serving millions of users.

Senior Site Reliability Engineer (Distributed Systems)

Senior Site Reliability Engineer position at Workday focusing on distributed systems and infrastructure reliability.

Senior Software Engineer, Site Reliability Tooling

Senior SRE Engineer role at Upstart focusing on building tooling and automation for monitoring infrastructure health and creating reliable systems.

Service Reliability Engineer

Senior Service Reliability Engineer position at Jobgether, offering remote work across Asia, focusing on system stability and technical problem-solving with competitive benefits and equity.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at Jobgether, focusing on cloud infrastructure, Kubernetes, and AWS services with comprehensive benefits and remote work flexibility.