Senior Site Reliability Engineer

Toast is driven by building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love.
$131,000 - $210,000
Site Reliability
Senior Software Engineer
Remote
3+ years of experience

Description For Senior Site Reliability Engineer

Toast is seeking a Senior Site Reliability Engineer to join their team. As an SRE at Toast, you'll be responsible for enabling engineering teams to ensure customer-facing services and other Toast production systems are running smoothly. You'll be a blend of pragmatic operator and software craftsperson, applying sound software engineering principles, operational discipline, and mature automation to Toast's environments and codebase.

Key responsibilities include:

  1. Implementing and evolving a world-class observability technology stack (25% of role)
  2. Acting as a champion for reliability and working with partner teams to improve resiliency (25%)
  3. Facilitating production triage, incident resolution, and root cause analysis (20%)
  4. Supporting and enabling the adoption of service resilience testing and chaos engineering (15%)

The ideal candidate will have 3-7 years of experience building and running production systems, a deep understanding of cloud and microservice architecture, and experience with major cloud platforms. You should be comfortable with coding, have experience with observability platforms, and possess a thirst for learning.

Toast offers competitive compensation, with a salary range of $131,000 to $210,000 USD, along with a comprehensive benefits package. This is a remote position, offering flexibility for the right candidate.

Join Toast in their mission to empower the restaurant community to delight guests, do what they love, and thrive. Apply now to be part of a team that's baking success into every aspect of the restaurant industry!

Last updated 6 months ago

Responsibilities For Senior Site Reliability Engineer

  • Implement and evolve a world-class observability technology stack
  • Act as a champion for reliability and work with partner teams to improve resiliency and reliability of all services
  • Facilitate and drive production triage, incident resolution, and retrospective/root cause analysis
  • Support and enable the adoption of a platform that enables service resilience testing/chaos engineering
  • Build and own a performance testing framework/environment

Requirements For Senior Site Reliability Engineer

Java
Linux
  • Extensive and broad industry experience with at least 3-7 years building and running production systems and participating in incident calls
  • Deep understanding of cloud and microservice architecture, and the JVM
  • Comfortable reading, writing, and debugging code
  • Experience with Observability platforms (Datadog, Splunk, New Relic, etc.) - APM, RUM, Synthetic monitoring
  • Demonstrated experience working with at least one major cloud platform (AWS, GCP, or Azure)
  • Exposure to complex, mission critical, and large scale distributed systems
  • Polyglot technologist/generalist with a thirst for learning

Benefits For Senior Site Reliability Engineer

401k
Medical Insurance
Dental Insurance
Vision Insurance
  • Competitive compensation and benefits programs
  • Flexibility to meet Toasters' changing needs

Interested in this job?

Jobs Related To Toast Senior Site Reliability Engineer

Site Reliability Engineer

Senior Site Reliability Engineer position at Behavox managing high-load distributed systems with 5+ years experience required in DevOps and cloud platforms.

Site Reliability Engineer

Senior Site Reliability Engineer position at Kong, focusing on managing and developing highly reliable API gateway solutions in a hybrid work environment in Bangalore.

Site Reliability Engineer - Video on Demand/Streaming Event Support

Senior Site Reliability Engineer role at Apple focusing on video streaming operations, offering $157K-$236K salary with comprehensive benefits in Irvine, CA.

Senior Site Reliability Engineer - NZ

Senior Site Reliability Engineer position at Datacom, focusing on maintaining and optimizing cloud infrastructure for the Smartly payroll platform.

Senior Software Developer, Reliability

Senior Software Developer position focusing on reliability engineering at Wealthsimple, working with Ruby, Java, and Kubernetes to ensure system reliability and scalability.