Senior Software Engineer, MLOps and Infrastructure

Training and deploying frontier models for developers and enterprises building AI systems for content generation, semantic search, RAG, and agents.
DevOps
Senior Software Engineer
Hybrid
5+ years of experience
AI

Description For Senior Software Engineer, MLOps and Infrastructure

Cohere is on a mission to scale intelligence to serve humanity, focusing on training and deploying frontier models for AI systems. As a Senior Software Engineer in MLOps and Infrastructure, you'll join a team responsible for building critical infrastructure that underpins Cohere's success. The role demands expertise in designing and managing large-scale distributed systems, particularly with Kubernetes and GPU workloads. You'll work with cutting-edge cloud technologies across GCP, Azure, AWS, and OCI, while building automated systems for deployment and operations. The position requires participation in a 24x7 on-call rotation (compensated) and targets candidates in the EMEA region.

The ideal candidate brings 5+ years of production infrastructure experience, strong Linux environment knowledge, and excellent troubleshooting skills. You'll be responsible for building self-service systems, custom Kubernetes operators, and ensuring robust observability and resilience. The role offers opportunities to work with state-of-the-art AI infrastructure while collaborating with a diverse team of world-class professionals.

Benefits include comprehensive health coverage, mental health support, parental leave, flexible work arrangements, and generous vacation time. Cohere values diversity and maintains an inclusive work environment, welcoming applicants from all backgrounds. This position offers the chance to shape the future of AI infrastructure while working with a team passionate about their craft and committed to customer success.

Last updated 2 days ago

Responsibilities For Senior Software Engineer, MLOps and Infrastructure

  • Build self-service systems that automate managing, deploying and operating services
  • Build custom Kubernetes operators that support language model deployments
  • Automate environment observability and resilience
  • Ensure defined SLOs are met, including participation in 24x7 on-call rotation
  • Build strong relationships with internal developers and influence Infrastructure team's roadmap
  • Develop team through knowledge sharing and active review process

Requirements For Senior Software Engineer, MLOps and Infrastructure

Go
Kubernetes
Linux
  • 5+ years of engineering experience running production infrastructure at large scale
  • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads
  • Experience working with GCP, Azure, AWS and/or OCI
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Excellent collaboration and troubleshooting skills
  • The grit and adaptability to solve complex technical challenges

Benefits For Senior Software Engineer, MLOps and Infrastructure

Dental Insurance
Medical Insurance
Mental Health Assistance
Parental Leave
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits
  • Mental health budget
  • 100% Parental Leave top-up for 6 months (Canada, US, and UK)
  • Personal enrichment benefits for arts, culture, fitness, and workspace
  • Remote-flexible with offices in Toronto, New York, San Francisco and London
  • Co-working stipend
  • 6 weeks of vacation

Interested in this job?

Jobs Related To Cohere Senior Software Engineer, MLOps and Infrastructure

Senior Software Engineer, MLOps and Infrastructure

Senior Software Engineer position at Cohere, focusing on MLOps and Infrastructure, building and managing large-scale distributed systems with Kubernetes and cloud technologies.

Software Engineer (Tools), Engagement Engineering

Senior Software Engineer position at Apple focusing on building developer tools and infrastructure for iOS, watchOS, tvOS, and macOS platforms.

Software Engineer (SRE Tools & Automation), IS&T Enterprise Systems

Senior SRE/DevOps Engineer role at Apple, leading production support and infrastructure automation for global customer service systems.

IT Infrastructure Engineer

Senior IT Infrastructure Engineer position at At-Bay, managing infrastructure, SaaS applications, and security compliance for a leading insurtech company.

Senior Windows Endpoint Engineer

Senior Windows Endpoint Engineer role at Convera, managing enterprise-wide endpoint solutions with competitive benefits and hybrid work model.