Xero, a leading platform for small business accounting and bookkeeping, is seeking a Site Reliability Engineer specializing in Chaos Engineering. This role is part of the Site Reliability Engineering organization and focuses on enhancing system resilience through controlled disruption testing.
The position involves designing and implementing chaos experiments to identify potential weaknesses in system architecture before they become actual problems. You'll be responsible for building and maintaining a comprehensive chaos engineering environment that enables scalable and repeatable testing across Xero's infrastructure.
As a Chaos Engineering SRE, you'll work with cutting-edge technologies including various cloud platforms (AWS, Azure, GCP) and container orchestration tools like Kubernetes. The role requires proficiency in programming languages such as Python, Go, Java, and others, along with experience in chaos engineering tools like Gremlin or Chaos Monkey.
Xero offers an exceptional benefits package including generous paid leave, comprehensive health coverage, 401k matching, and 26 weeks of paid parental leave. The company maintains a human-first culture that values diversity, inclusion, and work-life balance, making it an ideal place for engineers who want to make a meaningful impact while growing their careers.
The role combines technical expertise with collaborative leadership, as you'll be working across teams to implement improvements and educate others on chaos engineering principles. This is an opportunity to shape the reliability and resilience of systems that serve millions of small businesses worldwide while working with a supportive team that values innovation and technical excellence.