Senior/Staff Site Reliability Engineer

Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation.
Site Reliability
Staff Software Engineer
Hybrid
5+ years of experience

Description For Senior/Staff Site Reliability Engineer

Crusoe Energy is on a mission to unlock value in stranded energy resources through the power of computation. We aim to align the long term interests of the climate with the future of global computing infrastructure. As data centers consume an exponentially growing power footprint to deliver technology to all connected devices, we are inspired by making sure that the energy meeting that demand is sourced in an environmentally responsible fashion. Crusoe co-locates mobile data centers with stranded energy resources, like flare gas and underloaded renewables, to deliver low-cost, carbon-negative distributed computing solutions.

As a Site Reliability Engineer at Crusoe Energy Systems, you will play a pivotal role in ensuring the reliability and performance of our infrastructure. The SRE team is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Through automation and proactive remediation, you will not only resolve common errors automatically but also advise various engineering teams in building resilient code.

Your day-to-day responsibilities will include:

  • Reviewing overnight alerts and system performance metrics
  • Collaborating with your team in morning stand-up meetings
  • Automating routine processes and developing tools to enhance monitoring capabilities
  • Working closely with software engineers to advise on best practices for resilient code
  • Engaging in incident response drills, post-mortems, and root cause analysis sessions
  • Maintaining high SLIs and SLOs to ensure robust and reliable infrastructure

To thrive in this role, you should have:

  • 5+ years of professional SRE experience
  • Experience contributing to architecture and design of new and current systems
  • Solid understanding of infrastructure design and operational trade-offs
  • Experience with modern infrastructure tools (Docker, Kubernetes, Ansible, etc.)
  • Experience with CI/CD practices and build systems
  • Strong programming skills (Python, Go, or similar)
  • Experience with Unix/Linux environments and TCP/IP network programming
  • Knowledge of information security best practices

Join Crusoe Energy Systems to help build and maintain the robust systems that power our innovative, environmentally responsible computing solutions.

Last updated 3 months ago

Responsibilities For Senior/Staff Site Reliability Engineer

  • Ensure reliability and performance of infrastructure
  • Detect, analyze, and prevent issues to maintain high Service Level Agreement
  • Implement automation and proactive remediation
  • Advise engineering teams on building resilient code
  • Conduct thorough post-mortems and drive continuous improvement
  • Maintain high Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
  • Review overnight alerts and system performance metrics
  • Collaborate in team stand-up meetings
  • Automate routine processes
  • Develop tools to enhance monitoring capabilities
  • Engage in incident response drills and root cause analysis sessions

Requirements For Senior/Staff Site Reliability Engineer

Python
Go
Linux
Kubernetes
  • 5+ years of professional SRE experience
  • 5+ years of experience contributing to architecture and design of new and current systems
  • Bachelor's Degree in Computer Science or related field, or 8+ years relevant work experience
  • Solid understanding of infrastructure design, including the operational trade-offs of various designs
  • Experience writing high quality code with at least one programming language (Python, Go, or similar)
  • Experience building with modern infrastructure tools such as Docker, Kubernetes, Ansible, Cloud Formation, Terraform
  • Experience building with modern CI/CD practices and build systems, such as GitLab CI/CD, CircleCI, GitHub Actions
  • Experience with logging, monitoring and alerting systems and tools
  • Experience with Unix/Linux environments
  • Experience with TCP/IP and network programming
  • Experience with information security best practices
  • Excellent communication skills
  • Embody the Company values

Benefits For Senior/Staff Site Reliability Engineer

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Hybrid work schedule
  • Competitive Paid Time Off
  • Industry competitive pay
  • Retirement benefits
  • Healthcare benefits including Medical, Dental, and Vision
  • Short and Long-Term Disability Insurance
  • Life Insurance
  • Paid Parental Leave
  • Cell phone reimbursement
  • Subscription to Calm App

Interested in this job?

Jobs Related To Crusoe Energy Senior/Staff Site Reliability Engineer

Site Reliability Engineer (L5) - Security Engineering

Netflix seeks a Site Reliability Engineer (L5) for Security Engineering to enhance critical infrastructure reliability and support business growth in LIVE streaming, Gaming, and Ads.

Staff Software Engineer, Reliability Engineering

Staff Software Engineer for Site Reliability Engineering at Airbnb, developing tools and systems for service reliability and incident management.

Engineering Manager, Reliability Engineering

Airbnb seeks an Engineering Manager for Site Reliability to drive long-term strategy and ensure infrastructure performance.

Site Reliability Developer 4

Site Reliability Developer 4 at Oracle in Bengaluru, India. Design and deliver mission-critical stack with focus on security, resiliency, scale, and performance.