Principal Site Reliability Engineer

A privately held financial services company focused on making financial expertise broadly accessible and effective in helping people live the lives they want.
Site Reliability
Principal Software Engineer
In-Person
5+ years of experience
Finance · Enterprise SaaS

Description For Principal Site Reliability Engineer

Fidelity Investments is seeking a Principal Site Reliability Engineer to build and operate highly resilient platforms in AWS cloud environments. This role involves coordinating systems using Infrastructure as Code tools, performing reliability engineering throughout the SDLC, and deploying distributed multi-tiered applications using Kubernetes and CI/CD pipelines.

The ideal candidate will create and maintain dashboards to capture application performance metrics using tools like Splunk, Grafana, Prometheus, and Datadog. They will be responsible for creating SLI/SLO dashboards, identifying and resolving application issues, and supporting applications hosted in AWS Cloud and Kubernetes.

Key responsibilities include providing automated solutions for operational activities, analyzing application observability and performance, conducting root cause analysis, and ensuring business continuity. The role requires expertise in site reliability engineering, Kubernetes platforms, and automation tools.

Requirements include a Bachelor's degree in Computer Science or related field with 5 years of experience, or a Master's degree with 3 years of experience. The candidate must have demonstrated expertise in site reliability engineering, Kubernetes platforms, and automation tools.

Fidelity offers a comprehensive benefits package including 401(k) with company match, medical coverage, parental leave, and student loan assistance. The position is based in Westlake, TX with a hybrid working model requiring onsite presence every other week.

Last updated 13 days ago

Responsibilities For Principal Site Reliability Engineer

  • Build and operate resilient platforms in AWS cloud environments
  • Create and maintain performance monitoring dashboards
  • Perform reliability engineering throughout the SDLC
  • Deploy and support distributed multi-tiered applications
  • Provide automated solutions for operational activities
  • Conduct root cause analysis and resolve critical issues
  • Manage application scalability and resiliency
  • Mentor junior team members

Requirements For Principal Site Reliability Engineer

Python
Kubernetes
Redis
Java
Node.js
  • Bachelor's or Master's degree in Computer Science, Engineering, or related field
  • 5 years experience (with Bachelor's) or 3 years (with Master's) as Principal Site Reliability Engineer
  • Expertise in site reliability engineering and performance analysis
  • Experience with Kubernetes platforms and cloud environments
  • Knowledge of monitoring tools like Splunk, Grafana, Prometheus, and Datadog
  • Proficiency in Python, Shell Scripting, GIT, Docker

Benefits For Principal Site Reliability Engineer

401k
Medical Insurance
Dental Insurance
Vision Insurance
Parental Leave
  • 401(k) with company match
  • Medical, dental, vision and prescription drug coverage
  • 16-week maternity leave & 12-week parental leave
  • Student loan assistance

Interested in this job?

Jobs Related To Fidelity Investments Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer role at Fidelity Investments focusing on building and maintaining scalable, reliable infrastructure using cloud and DevOps practices.

Director, Software Engineering, Site Reliability

Lead LinkedIn's Site Reliability Engineering team of 40+ engineers, driving infrastructure reliability and automation while ensuring system scalability and performance.

Principal Engineer, AI, Trust, Security, Site Reliability Engineering

Principal Engineer position at Google focusing on AI, security, and site reliability engineering, leading technical initiatives for cloud platform infrastructure.

Director, Software Engineering, Site Reliability

Lead LinkedIn's Site Reliability Engineering team of 40+ engineers, driving infrastructure reliability and innovation for the world's largest professional network.

Principal Site Reliability Development Engineer

Principal SRE role at Oracle Cloud Infrastructure focusing on sovereign cloud operations and automation for government systems in Singapore.