Staff Site Reliability Engineer

Forma provides flexible benefits software helping companies offer competitive benefits packages while reducing costs by giving employees choice in spending benefit allowances.
United States
$150,000 - $250,000
Site Reliability
Staff Software Engineer
Remote
101 - 500 Employees
8+ years of experience
Enterprise SaaS · Finance

Description For Staff Site Reliability Engineer

Forma is revolutionizing the employee benefits market with its flexible benefits software platform. Founded in 2017, they're tackling the inefficiencies in traditional benefits approaches by offering customizable solutions through Lifestyle Spending Accounts, Health Spending Accounts, and more. The company serves notable clients like Stripe, Zoom, and Lululemon, maintaining impressive satisfaction ratings with a 75 NPS and 98 CSAT.

As a Staff Site Reliability Engineer at Forma, you'll join a team responsible for managing foundational infrastructure and developer experience. This role requires expertise in cloud platforms (GCP, AWS, Azure), Infrastructure as Code (Terraform), and strong programming skills, particularly in Python. You'll be instrumental in designing and implementing core services including container orchestration, databases, and cloud infrastructure.

Key responsibilities include building monitoring and observability systems, leading incident response, developing automation tools, maintaining deployment pipelines, and conducting architecture reviews. You'll also mentor team members and apply software engineering principles to improve system reliability and resilience.

The ideal candidate brings 8+ years of backend development experience, deep knowledge of operating systems and cloud infrastructure, and expertise in containerization technologies. Experience with CI/CD pipelines, IAC tools, and startup or fintech background is valued. Forma offers comprehensive benefits including remote work, health insurance, wellness programs, and generous parental leave.

This role presents an exciting opportunity to shape infrastructure direction at a growing company while working with modern technologies and solving complex challenges in the benefits technology space.

Last updated a month ago

Responsibilities For Staff Site Reliability Engineer

  • Build and maintain on-call, monitoring and alerting systems
  • Troubleshoot and resolve outages, conduct post-incident reviews
  • Develop tools and scripts to streamline operations
  • Build and maintain promote best practices, and troubleshoot CI/CD infrastructure
  • Analyze complex problems, identify root causes, and develop effective solutions
  • Mentor Engineering team members
  • Conduct chaos engineering experiments to identify system weaknesses

Requirements For Staff Site Reliability Engineer

Python
Kubernetes
Linux
  • 8+ years of backend software development experience
  • Strong understanding of operating systems (Linux, Windows, etc.), networking, and cloud infrastructure (AWS, GCP, and/or Azure)
  • Knowledge of containerization technologies (Kubernetes) and orchestration tools
  • Expertise in at least one programming language (Python, Go, Java, etc.) and scripting languages (Bash)
  • Experience with CI/CD pipelines and tools (Circle CI, Github Actions, etc.)
  • Working knowledge of IAC tools such as Terraform

Benefits For Staff Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Remote-first working environment
  • Medical, dental and vision insurance plans
  • Employee wellness program
  • One-time home office stipend
  • 401(k) savings plan
  • Flexible PTO policy
  • 12 weeks Parental Leave + 4 additional weeks for the Birthing Parent

Interested in this job?

Jobs Related To Forma Staff Site Reliability Engineer

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on building and scaling reliable systems, leading technical strategy, and mentoring teams while working remotely.

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, incident management, and building scalable systems with competitive compensation and remote work options.

Lead Engineer, Product Site Reliability Engineer

Lead Engineer position for Product Site Reliability Engineering at Xero, focusing on building and leading SRE teams to ensure system reliability and observability.

Technical Program Manager, Site Reliability

Technical Program Manager position at Google, leading Site Reliability initiatives for AI, Trust and Security platforms with 8+ years of experience required.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering team at Google, managing distributed systems and service reliability while mentoring engineers and driving technical excellence.