Principal Site Reliability Engineer

Salesforce

Global leader in CRM and enterprise cloud computing solutions.

San Francisco, CA, USA • Seattle, WA, USA

$223,000 - $323,400

Site Reliability

Principal Software Engineer

In-Person

5,000+ Employees

15+ years of experience

Enterprise SaaS · Cloud

This job posting may no longer be active. You may be interested in these related jobs instead:

Principal/Architect- Availability Engineering & SRE

Salesforce

Principal/Architect role leading Salesforce's SRE team, focusing on large-scale distributed systems and service reliability with 15+ years experience required.

VP, Software Engineering, SRE

Salesforce

Lead Salesforce's SRE organization as VP, driving reliability innovation and cultural transformation while managing a global team of 100+ engineers.

Principal Engineer, AI, Trust, Security, Site Reliability Engineering

Google

Lead technical initiatives in AI, Trust, and Security for Google's Site Reliability Engineering organization, architecting next-generation cloud platforms.

Principal/Architect- Availability Engineering & SRE

Salesforce

Principal/Architect role leading Salesforce's SRE team, focusing on large-scale distributed systems and service reliability with 15+ years experience required.

Principal Database Site Reliability Engineer

Oracle

Principal Database SRE role at Oracle Health, focusing on cloud infrastructure and healthcare applications transformation.

Description For Principal Site Reliability Engineer

Salesforce is seeking a Principal Site Reliability Engineer to join their Availability Engineering teams. This role is crucial in driving 'best in class' availability across their multi-substrate engineering platform that serves tens of millions of users. The position requires deep expertise in large-scale systems and concurrency, with a focus on crafting highly available solutions.

As a Principal SRE, you'll work with delivery teams to implement and maintain resilient applications deployed across thousands of compute nodes in multiple data centers. The role involves championing resiliency best practices, working with various cloud platforms (AWS, GCP, Azure & Alibaba), and contributing to open-source technologies.

The ideal candidate will bring 15+ years of software development experience, with at least 5 years in a leadership role. You'll be responsible for reverse engineering solutions, defining availability improvement projects, and maintaining critical infrastructure services. This position offers the opportunity to work on complex technical challenges while ensuring system reliability for one of the world's leading enterprise software companies.

You'll be part of a specialist unit focused on availability and resilience, where you'll have the chance to influence architectural decisions, mentor team members, and drive innovation in system availability. The role combines technical leadership with hands-on development, requiring both strategic thinking and practical implementation skills.

This is an excellent opportunity for a seasoned engineer who is passionate about system reliability, enjoys solving complex distributed systems challenges, and wants to make a significant impact on a platform that powers businesses worldwide. The position offers competitive compensation and the chance to work with cutting-edge technologies in a collaborative environment.

Last updated 21 days ago

Responsibilities For Principal Site Reliability Engineer

Embed with delivery teams in a Lead capacity, focusing on corrective and proactive availability measures
Design, develop, debug, and operate resilient applications across distributed systems
Champion resiliency best practices including observability tool integration and auto-scaling
Develop Infrastructure-as-Code using Terraform
Build/integrate with APIs and microservices on containerization frameworks
Resolve complex technical issues and drive innovations for system availability
Participate in on-call rotation to address complex problems in real-time
Balance live runtime management, feature delivery, and retirement of technical debt

Requirements For Principal Site Reliability Engineer

Java

Python

Kubernetes

Related technical degree required (masters preferred)
15+ years of hands-on software development experience
5+ years in a Tech Lead, Principal or Architect capacity
Mastery of object oriented languages such as Java, Golang, APEX, Python
Deep experience with core web technologies: HTTP, JSON, REST, XML
Proficiency with databases including Oracle or other relational/NoSQL solutions
Experience owning and operating multiple instances of critical services
Subject matter expertise on Service ownership best practices
Thorough knowledge of Agile development methodology

Salesforce

Global leader in CRM and enterprise cloud computing solutions.

San Francisco, CA, USA • Seattle, WA, USA

$223,000 - $323,400

Site Reliability

Principal Software Engineer

In-Person

5,000+ Employees

15+ years of experience

Enterprise SaaS · Cloud

Interested in this job?