Site Reliability Engineer - Dublin

AI-first Cloud infrastructure company pioneering vertically integrated, purpose-built AI infrastructure solutions powered by clean, renewable energy.
Site Reliability
Mid-Level Software Engineer
Hybrid
1+ year of experience
AI · Enterprise SaaS · Cloud

Description For Site Reliability Engineer - Dublin

Crusoe is revolutionizing the AI cloud infrastructure landscape as the World's Favorite AI-first Cloud infrastructure company. We specialize in delivering purpose-built AI infrastructure solutions that are trusted by Fortune 500 companies, all while maintaining a strong commitment to environmental sustainability through clean, renewable energy usage.

The Site Reliability Engineering (SRE) role at Crusoe is fundamental to maintaining our platform's reputation as the "gold standard" for reliability and performance. As an SRE, you'll be responsible for ensuring the robust operation of our infrastructure through proactive monitoring, automation, and problem-solving. The role involves working with cutting-edge AI infrastructure while focusing on maintaining high Service Level Agreements (SLAs) through careful attention to Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

Your day-to-day responsibilities will include automating routine processes, collaborating with various engineering teams, monitoring system performance, and responding to incidents. You'll play a crucial role in building and maintaining the internal infrastructure platform that enables software teams to operate efficiently. The position requires a blend of technical expertise in areas such as distributed systems, networking, and Linux, along with strong problem-solving and communication skills.

This is an excellent opportunity for someone with 1-3 years of SRE experience who wants to make a significant impact in the AI infrastructure space while working with a company that values both technological innovation and environmental responsibility. You'll be part of a team that's setting new standards in cloud infrastructure while contributing to a more sustainable future for computing.

Last updated 2 days ago

Responsibilities For Site Reliability Engineer - Dublin

  • Automate routine processes and build internal infrastructure platform
  • Collaborate in morning stand-up meetings and on action plans for data centers
  • Review overnight alerts and system performance metrics
  • Engage in incident response drills and post-mortems
  • Maintain high SLIs and SLOs
  • Document work and share insights with the team

Requirements For Site Reliability Engineer - Dublin

Python
Go
Kubernetes
Linux
  • 1-3 years of professional SRE experience
  • Experience with server-class hardware & provisioning
  • Understanding of distributed system architecture
  • Basic understanding of infrastructure design
  • Proficiency with at least one programming language (Python, Go, or similar)
  • Familiarity with infrastructure tools (Docker, Kubernetes, Ansible, etc.)
  • Experience with CI/CD practices
  • Experience with Unix/Linux environments
  • Understanding of network fundamentals
  • Bachelor's Degree in Computer Science, related field, or self-educated
  • Strong communication skills

Benefits For Site Reliability Engineer - Dublin

Medical Insurance
Dental Insurance
Vision Insurance
401k
Parental Leave
  • Hybrid work schedule
  • Competitive Paid Time Off
  • Industry competitive pay
  • Retirement benefits
  • Healthcare benefits including Medical, Dental, and Vision
  • Short and Long-Term Disability Insurance
  • Life Insurance
  • Paid Parental Leave
  • Subscription to Calm App

Interested in this job?

Jobs Related To Crusoe Site Reliability Engineer - Dublin

Site Reliability Engineer, Publish/Subscribe

Site Reliability Engineer position at Google focusing on large-scale distributed systems and infrastructure reliability for Google Cloud services.

Software Engineer, Traffic Trust SRE, DoS Infrastructure

Site Reliability Engineer position at Google focusing on Traffic Trust and DoS Infrastructure, combining security, distributed systems, and reliability engineering.

Software Engineer III, Site Reliability Engineering, Google Cloud

Site Reliability Engineer position at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Software Engineer III, Site Reliability Engineering

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.

Software Engineer III, Site Reliability Engineering, Google Cloud

Site Reliability Engineer role at Google Cloud focusing on building and maintaining large-scale distributed systems with emphasis on reliability and automation.