Cohere is on a mission to scale intelligence to serve humanity, focusing on training and deploying frontier models for AI systems. As a Senior Software Engineer in MLOps and Infrastructure, you'll join a team responsible for building critical infrastructure that underpins Cohere's success. The role demands expertise in designing and managing large-scale distributed systems, particularly with Kubernetes and GPU workloads. You'll work with cutting-edge cloud technologies across GCP, Azure, AWS, and OCI, while building automated systems for deployment and operations. The position requires participation in a 24x7 on-call rotation (compensated) and targets candidates in the EMEA region.
The ideal candidate brings 5+ years of production infrastructure experience, strong Linux environment knowledge, and excellent troubleshooting skills. You'll be responsible for building self-service systems, custom Kubernetes operators, and ensuring robust observability and resilience. The role offers opportunities to work with state-of-the-art AI infrastructure while collaborating with a diverse team of world-class professionals.
Benefits include comprehensive health coverage, mental health support, parental leave, flexible work arrangements, and generous vacation time. Cohere values diversity and maintains an inclusive work environment, welcoming applicants from all backgrounds. This position offers the chance to shape the future of AI infrastructure while working with a team passionate about their craft and committed to customer success.