Principal Site Reliability Engineer

Leading platform-enabled unified security operations company providing comprehensive security operations solutions.
McLean, VA, USA
$170,000 - $200,000
Site Reliability
Principal Software Engineer
Remote
101 - 500 Employees
8+ years of experience
Cybersecurity · Enterprise SaaS

Description For Principal Site Reliability Engineer

UltraViolet Cyber, a leading unified security operations company, is seeking a Principal Site Reliability Engineer to join their team. This role combines advanced technical expertise with leadership responsibilities, focusing on enhancing the scalability, reliability, and security of cloud infrastructure. The position offers an opportunity to work with cutting-edge technologies in cybersecurity, particularly with Amazon EKS and AWS services.

The role demands expertise in Kubernetes, DevOps practices, and cloud infrastructure, with responsibilities ranging from system reliability management to cost optimization. You'll be working with a comprehensive tech stack including Kubernetes, Python, and various AWS services, while implementing security best practices and maintaining high-availability systems.

The company provides an attractive benefits package including 401(k) matching, comprehensive health insurance, and flexible time off. Based in McLean, Virginia, with global offices across the U.S. and India, UltraViolet Cyber serves Fortune 500 companies and Federal Government clients, offering a platform that combines technology innovation with human expertise.

This position is ideal for a seasoned professional who enjoys solving complex technical challenges, mentoring others, and working in a fast-paced environment. The role offers competitive compensation ($170,000-$200,000) and the flexibility of remote work, making it an excellent opportunity for experienced SREs looking to make a significant impact in the cybersecurity space.

Last updated 12 days ago

Responsibilities For Principal Site Reliability Engineer

  • Ensure availability, performance, scalability, and security of cloud-based services
  • Architect, deploy, and maintain Kubernetes clusters using Amazon EKS
  • Automate infrastructure provisioning using IaC tools
  • Build and maintain CI/CD pipelines
  • Design and implement monitoring, alerting, and logging solutions
  • Enforce security best practices and compliance
  • Conduct capacity planning and scaling
  • Lead cross-functional collaboration
  • Manage incidents and perform root cause analysis
  • Optimize cloud costs while maintaining performance

Requirements For Principal Site Reliability Engineer

Kubernetes
Python
Go
  • Extensive experience in AWS, particularly with EKS clusters
  • Strong proficiency in Kubernetes ecosystem
  • Hands-on experience with DevOps tools & methodologies
  • Proficiency in Python, Bash, or Golang
  • Experience with observability and monitoring tools
  • Deep understanding of networking principles
  • Strong background in security best practices
  • Experience with highly available, distributed systems
  • Previous experience in Agile or DevOps culture
  • Excellent troubleshooting skills
  • Strong communication and leadership skills
  • Bachelor's degree in Computer Science, Engineering, or related field

Benefits For Principal Site Reliability Engineer

401k
Medical Insurance
Dental Insurance
Vision Insurance
  • 401(k) with employer match of 100% of first 3% and 50% of next 2%
  • Medical, Dental, and Vision Insurance
  • Group Term Life Insurance
  • Short-Term Disability
  • Long-Term Disability
  • Discretionary Time Off (DTO) Program
  • 11 Paid Holidays Annually

Interested in this job?

Jobs Related To UltraViolet Cyber Principal Site Reliability Engineer

Systems Engineering Principal

Principal Engineer position at Salesforce focusing on system reliability, incident analysis, and driving technical improvements across cloud platforms.

Director, Software Engineering, Site Reliability

Lead LinkedIn's Site Reliability Engineering team, directing 40+ engineers in managing critical infrastructure systems while driving innovation and reliability improvements.

Engineering Director, P2020 Rollouts

Lead the strategy and development of Google's Rollouts production platform, managing continuous deployment solutions for Alphabet and Google services.

Principal Site Reliability Engineer, ML Capacity Planning, Acceleration

Lead ML infrastructure optimization and capacity planning at Google as Principal SRE, managing global teams and strategic initiatives across 20+ countries.

Principal Engineer, AI, Trust, Security, Site Reliability Engineering

Lead AI platform development and security initiatives as a Principal Engineer at Google, architecting reliable and secure distributed systems for cloud AI infrastructure.