Principal Site Reliability Engineer

A privately held financial services company that makes financial expertise broadly accessible and effective in helping people live the lives they want.
Site Reliability
Principal Software Engineer
In-Person
5+ years of experience
Finance

Description For Principal Site Reliability Engineer

Fidelity Investments is seeking a Principal Site Reliability Engineer to join their technology team in delivering high-scale, highly available services with resilience through automation and Infrastructure as Code. This role is perfect for an experienced SRE who is passionate about building reliability into ecosystems by applying best practices in Resiliency Engineering, Automation, Observability, and Chaos Testing.

The ideal candidate will have extensive experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, and Chef) and modern monitoring tools (Datadog, Prometheus, and Splunk). You'll be working with various scripting languages, particularly Python and Shell scripting, to help teams scale through production insights, operational automation, developer guidance, and real-time metrics.

As a Principal SRE at Fidelity, you'll be responsible for maintaining scalability and resiliency in complex environments, implementing advanced observability practices, and executing root cause analysis. You'll work closely with both technical and non-technical stakeholders, presenting new software solutions and best practices to development teams.

The role requires a Bachelor's degree in Computer Science or related field with 5 years of experience, or a Master's degree with 3 years of experience. Key technical requirements include expertise in Envoy Gateway Infrastructure, CI/CD pipelines, Terraform for cloud platforms (AWS, AWS GovCloud, OCI), and comprehensive monitoring solutions using tools like Datadog and Splunk.

Fidelity offers an excellent benefits package including 401(k) with company match, comprehensive healthcare coverage, parental leave, and student loan assistance. Join a company that values innovation, individual merit, and provides opportunities for professional growth in a collaborative environment.

Last updated 9 days ago

Responsibilities For Principal Site Reliability Engineer

  • Performs Instrumentation with systems skills on building and operating, monitoring, logging, and alerting services of distributed systems at scale
  • Maintains scalability and resiliency in complex environments
  • Implements advanced observability practices and techniques at scale
  • Triages and executes root cause analysis
  • Manages and interprets large datasets using query languages and visualization tools
  • Communicates with both technical and non-technical audiences
  • Presents new software, methods and practices to developers
  • Works with a variety of individuals and groups in a constructive and collaborative manner
  • Applies Cloud Computing and DevOps concepts including CI/CD pipelines in system and infrastructure maintenance

Requirements For Principal Site Reliability Engineer

Kubernetes
Python
Redis
  • Bachelor's degree in Computer Science, Engineering, IT, Information Systems, Mathematics, Physics, or related field
  • 5 years of experience as a Principal Site Reliability Engineer
  • Experience designing, building, deploying, and maintaining infrastructure in AWS and Azure
  • Expertise in designing and building Envoy Gateway Infrastructure
  • Experience implementing CI/CD pipelines
  • Expertise in automating infrastructure provisioning through Terraform
  • Experience with monitoring tools like Datadog, Splunk, and ELK

Benefits For Principal Site Reliability Engineer

401k
Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
  • 401k
  • Medical Insurance
  • Dental Insurance
  • Vision Insurance
  • Parental Leave

Interested in this job?

Jobs Related To Fidelity Investments Principal Site Reliability Engineer

Principal Software Engineering - Availability

Principal Software Engineering role at Salesforce focusing on Site Reliability Engineering, building and maintaining large-scale distributed systems with 15+ years of experience required.

Principal/Architect- Software Engineering - Availability

Principal Software Engineer role at Salesforce focusing on Site Reliability Engineering, requiring 15+ years of experience in building large-scale distributed systems.

Engineering Director, P2020 Rollouts

Lead Google's Rollouts production platform strategy and development, managing continuous deployment solutions for Alphabet and Google services.

Principal Engineer, AI, Trust, Security, Site Reliability Engineering

Lead AI platform development and security initiatives as a Principal Engineer at Google, architecting reliable and secure distributed systems for cloud AI infrastructure.