Site Reliability Engineering (SRE) at Salesforce combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This principal role will shape the technical strategy for SRE and influence the strategy for the Availability Cloud. The position offers unique challenges of scale specific to Salesforce, requiring expertise in coding, algorithms, complexity analysis, and large-scale system design.
The role involves embedding with product teams, defining availability roadmaps, and delivering against them. You'll be crucial in maturing the SRE practice, mentoring engineers, and scaling the impact of your community. The team culture emphasizes diversity, intellectual curiosity, and problem-solving in a blame-free environment.
As a Principal/Architect, you'll work on meaningful projects while having the support and mentorship needed to learn and grow. You'll be responsible for developing full paved path observability platform integrations, maintaining service health, and scaling systems through automation. The position requires hands-on coding (at least 25%) while leading and mentoring others.
The ideal candidate brings 15+ years of software development experience, with deep expertise in distributed systems, service ownership, and technical leadership. You'll work with technologies like Kubernetes, Istio, and public cloud platforms, applying your knowledge of core web technologies and service ownership best practices.
Join Salesforce's SRE team to tackle complex challenges, drive technical strategy, and shape the future of large-scale system reliability while working with a diverse, collaborative team in a supportive environment focused on continuous learning and growth.