Crusoe is revolutionizing the AI cloud infrastructure landscape as the World's Favorite AI-first Cloud infrastructure company. We specialize in delivering purpose-built AI infrastructure solutions that are trusted by Fortune 500 companies, all while maintaining a strong commitment to environmental sustainability through clean, renewable energy usage.
The Site Reliability Engineering (SRE) role at Crusoe is fundamental to maintaining our platform's reputation as the "gold standard" for reliability and performance. As an SRE, you'll be responsible for ensuring the robust operation of our infrastructure through proactive monitoring, automation, and problem-solving. The role involves working with cutting-edge AI infrastructure while focusing on maintaining high Service Level Agreements (SLAs) through careful attention to Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Your day-to-day responsibilities will include automating routine processes, collaborating with various engineering teams, monitoring system performance, and responding to incidents. You'll play a crucial role in building and maintaining the internal infrastructure platform that enables software teams to operate efficiently. The position requires a blend of technical expertise in areas such as distributed systems, networking, and Linux, along with strong problem-solving and communication skills.
This is an excellent opportunity for someone with 1-3 years of SRE experience who wants to make a significant impact in the AI infrastructure space while working with a company that values both technological innovation and environmental responsibility. You'll be part of a team that's setting new standards in cloud infrastructure while contributing to a more sustainable future for computing.