Senior Site Reliability Engineer - Platform Microservices Reliability

Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We provide cloud services combining digital, core, analytics, and AI.
Site Reliability
Senior Software Engineer
Hybrid
1,000 - 5,000 Employees
10+ years of experience
Enterprise SaaS · Finance
This job posting may no longer be active. You may be interested in these related jobs instead:
Site Reliability Engineer

Senior Site Reliability Engineer position at OneDegree, focusing on cloud infrastructure, monitoring, and automation for insurance and cybersecurity platforms in APAC.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Prove, focusing on building and maintaining scalable, reliable systems for digital identity solutions.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Prove, focusing on building and maintaining scalable, reliable systems for digital identity solutions.

Senior Software Developer, Site Reliability Engineering, Google Cloud

Senior SRE role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and scalability.

Senior Software Engineer, Site Reliability Engineering, Google Cloud

Senior SRE position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Description For Senior Site Reliability Engineer - Platform Microservices Reliability

Guidewire is seeking a Senior Site Reliability Engineer for Platform Microservices Reliability. As part of the Platform team, you'll be dedicated to creating and running software that improves the reliability of systems in production, serving hundreds of customers and supporting millions of transactions daily. You'll ensure the reliability of Guidewire's flagship cloud platform and InsuranceSuite products, build tooling for efficient operations, and collaborate closely with core product developers.

Key responsibilities include:

  • Enhancing reliability and efficiency of microservices applications
  • Participating in design reviews and production readiness checks
  • Analyzing observability data to improve operational metrics
  • Automating infrastructure in AWS
  • Building and maintaining observability tooling and dashboards
  • Improving incident management lifecycle

Required skills:

  • 10+ years of experience with a Bachelor's in Computer Science or related field
  • Proficiency in Bash, Python, and/or Go
  • Experience with Java-based web applications, Linux systems, and AWS
  • Expertise in containerization (Docker, Kubernetes/EKS)
  • Strong understanding of SSO, SAML, OAuth, and x.509 certificates
  • Experience with relational databases and observability tools

Guidewire offers a collaborative environment where you can make an impact, be inspired by colleagues, and be empowered to go further. Join a team passionate about delivering quality products and support in the P&C insurance industry.

This role involves occasional travel (less than 5%) and requires participation in rotating on-call support for production emergencies.

Last updated 2 months ago

Responsibilities For Senior Site Reliability Engineer - Platform Microservices Reliability

  • Collaborate with development teams to enhance the reliability and efficiency of microservices applications
  • Engage with product development teams in design reviews and production readiness checks
  • Analyze data from observability and monitoring tools to improve operational metrics
  • Create system documentation and training materials
  • Oversee and automate the team's growing presence in AWS
  • Build and maintain observability tooling, metrics, and dashboarding for global platform product infrastructure
  • Improve incident management lifecycle
  • Collaborate with engineering teams, providing product feedback and contributing code to the product when necessary
  • Participate in rotating on-call support for weekend production emergencies

Requirements For Senior Site Reliability Engineer - Platform Microservices Reliability

Java
Python
Go
Linux
Kubernetes
PostgreSQL
Kafka
  • Bachelor's Degree in Computer Science or related field with 10+ Years of experience
  • Software engineering and task automation skills with Bash, Python, and/or Go
  • Experience in developing and maintaining Java-based web applications
  • Deep background with Linux systems and engineering
  • Highly experienced with engineering and automating on Amazon Web Services (AWS)
  • Prior experience with IaC tools like Terraform/Terragrunt/Terraspace
  • Production-At-Scale support background in a heavily microservice-based world
  • Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)
  • Strong understanding of Single-Sign On, SAML, OAuth
  • Seasoned expertise around x.509 certificate technology and basic concepts of encryption
  • Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS
  • Advanced exposure to application development, web UI, JSON, application architecture
  • Experience strongly utilizing observability tools like Datadog, CloudWatch, and PagerDuty
  • Ability to read, write, and speak English

Interested in this job?