Fidelity Investments is seeking a Principal Site Reliability Engineer to join their technology team in delivering high-scale, highly available services with resilience through automation and Infrastructure as Code. This role is perfect for an experienced SRE who is passionate about building reliability into ecosystems by applying best practices in Resiliency Engineering, Automation, Observability, and Chaos Testing.
The ideal candidate will have extensive experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, and Chef) and modern monitoring tools (Datadog, Prometheus, and Splunk). You'll be working with various scripting languages, particularly Python and Shell scripting, to help teams scale through production insights, operational automation, developer guidance, and real-time metrics.
As a Principal SRE at Fidelity, you'll be responsible for maintaining scalability and resiliency in complex environments, implementing advanced observability practices, and executing root cause analysis. You'll work closely with both technical and non-technical stakeholders, presenting new software solutions and best practices to development teams.
The role requires a Bachelor's degree in Computer Science or related field with 5 years of experience, or a Master's degree with 3 years of experience. Key technical requirements include expertise in Envoy Gateway Infrastructure, CI/CD pipelines, Terraform for cloud platforms (AWS, AWS GovCloud, OCI), and comprehensive monitoring solutions using tools like Datadog and Splunk.
Fidelity offers an excellent benefits package including 401(k) with company match, comprehensive healthcare coverage, parental leave, and student loan assistance. Join a company that values innovation, individual merit, and provides opportunities for professional growth in a collaborative environment.