Taro Logo

Site Reliability Engineer

A global technology company developing real-time communication platforms and services across 230+ countries.
Site Reliability
Senior Software Engineer
Hybrid
5+ years of experience
Enterprise SaaS

Description For Site Reliability Engineer

Hyperconnect's Platform Department is seeking a Site Reliability Engineer to join their team that provides infrastructure and common platform technology across all services including Azar and new products. The SRE team's mission is to ensure all services developed at Hyperconnect remain stable, allowing users to enjoy special experiences without interruption. Working with AWS, Kubernetes, and Service mesh, you'll manage modern computing and network infrastructure across all services and systems. The role goes beyond simple infrastructure management, allowing deep contribution to backend engineering. Given the real-time nature of the business, you'll work on high-performance, low-latency systems. You'll gain experience managing large-scale infrastructure in a global environment, handling multi-products, and working with both B2B and B2C environments. The team uses cutting-edge tools like Terraform, Helm, ArgoCD, and Spinnaker for infrastructure management, and implements comprehensive monitoring solutions using Zabbix, Prometheus, OpenTelemetry, and Elasticsearch. You'll be part of a team that values automation, continuous improvement, and proactive problem-solving, while working in a collaborative environment that spans multiple technical teams and stakeholders.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

  • Build and operate high-availability system infrastructure in AWS cloud environment
  • Implement and manage system/application logging, monitoring, and automation using tools like Zabbix and Prometheus
  • Lead incident response and postmortem culture
  • Identify and optimize service improvements based on SLO/SLI metrics
  • Conduct PoCs for new technologies and implement them in production
  • Manage and improve monitoring systems using OpenTelemetry and Elasticsearch
  • Support 300+ microservices with application monitoring

Requirements For Site Reliability Engineer

Kubernetes
Go
Python
Linux
  • Strong understanding of CS fundamentals, especially Linux and Networking
  • Understanding of container technologies
  • Programming ability in Python, Golang
  • Practical experience with Linux servers in public cloud environments (AWS)
  • Excellent communication skills and documentation ability
  • Ability to identify and proactively solve various service issues
  • Enthusiasm for learning new technologies

Interested in this job?

Jobs Related To Hyperconnect Site Reliability Engineer