Cohere is on a mission to scale intelligence to serve humanity, focusing on training and deploying frontier models for AI systems. As a Senior Software Engineer in MLOps and Infrastructure, you'll join a team responsible for building critical infrastructure that underpins Cohere's success. The role demands expertise in designing and managing large-scale distributed systems, particularly with Kubernetes and GPU workloads. You'll work with cutting-edge cloud technologies across GCP, Azure, AWS, and OCI, while building automated systems for deployment and operations.
The position requires participation in a 24x7 on-call rotation (compensated) and targets candidates based in EMEA. You'll be instrumental in developing self-service systems, custom Kubernetes operators, and ensuring robust observability and resilience. The ideal candidate brings 5+ years of production infrastructure experience, strong troubleshooting skills, and a collaborative mindset.
Cohere offers an inclusive work environment with offices in Toronto, New York, San Francisco, and London, providing flexibility with remote work options. The company values diversity and offers comprehensive benefits including health and dental coverage, mental health support, parental leave, and generous vacation time. This is an opportunity to work with world-class professionals while shaping the future of AI infrastructure.
If you're passionate about building systems that enhance developer productivity, have experience with Go programming, and enjoy contributing to open-source solutions, this role offers the chance to make a significant impact in the AI industry while working with cutting-edge technology.