Zscaler, a leading cloud security company, is seeking a Staff Site Reliability Engineer-Technical Duty Officer to join their Shared Platform Engineer team. This role involves leading the transformation to a world-leading SRE organization, providing expert leadership during critical outages, promoting customer-focused approaches, developing scalable process frameworks, and collaborating with product teams to improve service reliability.
Key responsibilities include:
- Advocating for SRE principles within the Engineering Department
- Coordinating multiple teams during critical outages for streamlined decision-making and quick resolution
- Addressing and mitigating global customer environment issues
- Fostering a culture of continuous learning and technical excellence
- Implementing observability strategies for rapid problem diagnosis and response
- Analyzing failures and integrating insights to improve service reliability, scalability, and operational efficiency
The ideal candidate will have:
- 5+ years of experience as a Site Reliability Engineer
- Hands-on experience troubleshooting Linux-based systems
- Strong networking knowledge (TCP/IP, SSL/TLS, DNSSEC, IPsec, BGP)
- Coding experience, preferably in Python
- Bachelor's degree in Computer Science or related field
Preferred qualifications:
- Experience supporting High/Moderate FedRAMP environments
- Understanding of Observability practices and tools (Grafana, DataDog, Splunk, etc.)
- Experience leading major incidents in large scale, high uptime environments
This role offers remote work options, with a preference for the Eastern Time Zone. Zscaler provides comprehensive benefits, including various health plans, time off, parental leave, retirement options, and education reimbursement.
Join Zscaler's Engineering team and contribute to building and innovating the world's largest cloud security platform, serving thousands of enterprise customers globally.