Manager Site Reliability Engineer

Next-Gen Banking Tech company empowering banks and fintechs to launch banking products with cloud-native processing platform Zeta Tachyon.
Site Reliability
Staff Software Engineer
In-Person
1,000 - 5,000 Employees
10+ years of experience
Finance · Enterprise SaaS

Description For Manager Site Reliability Engineer

Zeta, a pioneering Next-Gen Banking Tech company valued at $1.5 billion, is seeking a Manager Site Reliability Engineer to join our innovative team. Founded in 2015, we've revolutionized banking technology with our cloud-native Zeta Tachyon platform, successfully processing over 20M+ cards globally.

As our Manager SRE, you'll play a crucial role in bridging development and operations, ensuring system reliability and scalability. You'll lead a team of SREs, implementing best practices in automation, monitoring, and infrastructure management. The position offers the opportunity to work with cutting-edge technologies including Kubernetes, cloud platforms, and modern DevOps tools.

Our ideal candidate brings 10-15 years of SRE experience and strong technical expertise in programming, cloud computing, and infrastructure as code. You'll be part of a company with 1700+ employees (70% in R&D) across US, EMEA, and Asia, backed by major investors like Softbank and Mastercard.

This role combines technical leadership with team management, offering the chance to shape the reliability and scalability of systems that transform banking experiences. You'll work in our Hyderabad office, contributing to a culture of automation and continuous improvement while mentoring team members and driving technical excellence.

Join us in revolutionizing banking technology while working with a diverse, inclusive team that values innovation and technical expertise. This is an excellent opportunity for an experienced SRE leader looking to make a significant impact in the fintech industry.

Last updated 2 months ago

Responsibilities For Manager Site Reliability Engineer

  • Ensure reliability of software systems through scalable infrastructure
  • Develop automation tools and scripts for operational tasks
  • Monitor system performance and respond to incidents
  • Conduct capacity planning and usage pattern analysis
  • Implement and maintain monitoring and logging solutions
  • Lead and motivate a team of SREs
  • Provide mentorship and coaching to team members
  • Implement security best practices in infrastructure
  • Develop and maintain disaster recovery plans
  • Drive continuous improvement initiatives

Requirements For Manager Site Reliability Engineer

Python
Go
Kubernetes
  • 10-15 years of experience in site reliability engineering
  • B.Tech/M.Tech in computer science, information technology or related field
  • Proficiency in Python, Go, Shell, Bash
  • Experience with Docker and Kubernetes
  • Proficiency in cloud platforms (AWS, Azure, or Google Cloud)
  • Knowledge of Infrastructure as Code tools like Terraform
  • Experience with monitoring tools (Prometheus, Grafana, ELK stack)
  • Understanding of networking concepts and protocols
  • Proficient in version control systems like Git
  • Experience in CI/CD implementation

Interested in this job?

Jobs Related To Zeta Manager Site Reliability Engineer

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, incident management, and building scalable systems with competitive compensation and remote work options.

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on building and scaling reliable systems, leading technical strategy, and mentoring teams while working remotely.

Senior Site Reliability Developer

Senior Site Reliability Developer position at Oracle, focusing on cloud infrastructure, automation, and large-scale distributed systems.

Lead Engineer, Product Site Reliability Engineer

Lead Engineer position for Product Site Reliability Engineering at Xero, focusing on building and leading SRE teams to ensure system reliability and observability.

Technical Program Manager, Site Reliability

Technical Program Manager position at Google, leading Site Reliability initiatives for AI, Trust and Security platforms with 8+ years of experience required.