Staff Engineer, Platform/SRE

Leading omnichannel customer engagement solution powering personalized customer journeys across mobile and web push notifications, in-app messaging, SMS, and email.
Site Reliability
Staff Software Engineer
Remote
101 - 500 Employees
8+ years of experience
Enterprise SaaS

Description For Staff Engineer, Platform/SRE

OneSignal is a leading omnichannel customer engagement solution that has achieved remarkable scale, serving billions of HTTP requests daily. As a Series C venture-backed company supported by SignalFire, Rakuten Ventures, and Y Combinator, we're revolutionizing how businesses connect with their users. The Platform/SRE team plays a crucial role in maintaining our 99.95% uptime while scaling our infrastructure.

As a Staff Engineer in Platform/SRE, you'll be at the forefront of our infrastructure engineering efforts, working with cutting-edge technologies like Rust and Go. You'll be responsible for operating and engineering our infrastructure's future, managing critical systems including PostgreSQL, Scylla, Redis, Kafka, and Kubernetes. Your role combines software engineering with infrastructure focus, requiring both deep technical knowledge and automation expertise.

The position offers unique challenges in scaling our systems while maintaining high reliability. You'll work with a diverse tech stack, automate data center operations, and collaborate across teams to architect highly scalable services. The role requires not just operational knowledge but the ability to transform manual processes into automated systems.

OneSignal offers a remote-first culture with offices in San Mateo, CA and London, UK. We pride ourselves on maintaining a healthy work-life balance and fostering an environment of ownership and personal growth. Join us in shaping the future of customer engagement technology while working with a team that values both technical excellence and human connection.

Last updated a month ago

Responsibilities For Staff Engineer, Platform/SRE

  • Optimize and elevate system performance by identifying bottlenecks
  • Set up infrastructure and configuration as code using Kubernetes and Terraform
  • Establish and maintain observability and monitoring stack
  • Define and implement CI/CD best practices
  • Collaborate with engineering teams to architect scalable services
  • Participate in on-call rotation and incident response

Requirements For Staff Engineer, Platform/SRE

Go
Rust
Kubernetes
PostgreSQL
Redis
Kafka
  • At least 8 years of platform experience
  • Experience operating reliable production systems at scale
  • Knowledge of Linux systems internals
  • Ability to automate tasks
  • Experience in managing PostgreSQL for high-scale throughput systems
  • Operational experience deploying and managing Kubernetes
  • Experience working with Cloud Providers (AWS/GCP/Azure)

Interested in this job?

Jobs Related To OneSignal Staff Engineer, Platform/SRE

Technical Program Manager, Site Reliability

Technical Program Manager position at Google, leading Site Reliability initiatives for AI, Trust and Security platforms, requiring 8 years of program management experience.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering teams at Google, managing distributed systems and ensuring service reliability while driving technical excellence and team development.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering team at Google, managing distributed systems and service reliability while mentoring engineers and driving technical excellence.

Software Engineering Manager II, Site Reliability Engineering, Google Cloud

Lead SRE team at Google Cloud, managing distributed systems reliability and performance while mentoring engineers and driving technical excellence.

Software Engineering Manager II, Site Reliability Engineering

Lead Site Reliability Engineering team at Google, managing distributed systems and ensuring service reliability while providing technical leadership and mentorship.