Site Reliability Engineer

Kroo Bank is a fintech company on a mission to build the world's greatest social bank, aiming to transform banking for the better.
Site Reliability
Senior Software Engineer
Hybrid
51 - 100 Employees
5+ years of experience
Finance

Description For Site Reliability Engineer

Kroo Bank is on a mission to build the world's greatest social bank, believing that banking needs to change for the better. As a Site Reliability Engineer at Kroo, you'll take ownership of the implementation, monitoring, maintenance, and improvement of core services. Key responsibilities include advocating for reliability, setting and monitoring SLOs, developing system-wide application monitoring, conducting reliability and resilience tests, improving the codebase and infrastructure, building documentation and playbooks, and being part of the on-call schedule for major incident management. The ideal candidate should have experience with public cloud providers, IaC tools, programming languages (preferably Typescript or Clojure), SDLC, monitoring tools, SRE practices, and managing high-performance, high-security applications. Kroo offers a hybrid working environment, a modern office in Central London, and a comprehensive benefits package including generous holiday time, mental health support, workplace pension, and various schemes to support work-life balance and personal growth.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer

  • Advocate for reliability across the engineering team
  • Create, set, and monitor SLOs for core services
  • Develop and maintain system-wide application monitoring
  • Monitor third-party provider performance
  • Conduct reliability and resilience tests
  • Improve codebase and infrastructure reliability
  • Build and maintain documentation and playbooks
  • Implement changes to improve platform performance
  • Assist with Disaster Recovery Plan development and testing
  • Support major releases and go-lives
  • Participate in on-call schedule for major incident management

Requirements For Site Reliability Engineer

TypeScript
  • Experience with public cloud providers (AWS, Azure, GCP)
  • Knowledge of IaC tools (Cloudformation, Terraform)
  • Knowledge of programming languages (Typescript or Clojure preferred)
  • Understanding of SDLC
  • Experience with monitoring and APM tools (Datadog preferred)
  • Knowledge of SRE practices
  • Experience in high-performance, high-security applications
  • Familiarity with microservice architecture
  • Experience managing technical incidents
  • Knowledge of IT security practices
  • Experience implementing disaster recovery strategies
  • Excellent communication skills

Benefits For Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
Parental Leave
  • 25 days annual leave
  • 8 bank holidays
  • 1 Kroo bank holiday
  • 1 birthday day off
  • 3 personal days
  • Employer-sponsored volunteer program
  • Mental health support (Spill)
  • Workplace pension
  • Top-notch equipment (MacBook)
  • Modern office in Central London
  • Cycle to Work scheme
  • Electric Car scheme
  • Enhanced parental leave
  • Healthcare for employee and nuclear family (Vitality)

Interested in this job?

Jobs Related To Kroo Bank Ltd Site Reliability Engineer

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and scalability.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability, automation, and infrastructure development.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Senior Software Engineer, Site Reliability Engineering

Senior SRE position at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.