Cohere is at the forefront of AI development, training and deploying frontier models for enterprises and developers. As a Staff Software Engineer in ML Ops and Infrastructure, you'll be instrumental in building the foundation that powers Cohere's AI systems. The role demands expertise in large-scale infrastructure management, with a focus on Kubernetes and GPU workloads. You'll be working in EMEA, joining a team that values technical excellence and collaborative problem-solving.
The position requires strong experience with cloud platforms (GCP, Azure, AWS, OCI) and Linux environments. You'll be responsible for developing self-service systems, custom Kubernetes operators, and ensuring robust observability. The role includes participation in a 24x7 on-call rotation (with compensation) and requires 5+ years of engineering experience.
Cohere offers an inclusive work environment with impressive benefits, including comprehensive health coverage, parental leave, and flexible remote work options. The company maintains offices in major tech hubs and provides 6 weeks of vacation. They value diversity and encourage applications from all backgrounds, providing accommodations as needed during recruitment.
The ideal candidate will have proven production experience with Kubernetes, hands-on coding experience in Go, and a passion for building systems that enhance team productivity. You'll be working with cutting-edge AI technology while contributing to open-source solutions and mentoring team members. The role offers an opportunity to shape the future of AI infrastructure while working with some of the best talents in the field.