Senior Software Engineer, MLOps and Infrastructure

Training and deploying frontier models for developers and enterprises building AI systems for content generation, semantic search, RAG, and agents.
DevOps
Senior Software Engineer
Hybrid
5+ years of experience
AI

Description For Senior Software Engineer, MLOps and Infrastructure

Cohere is on a mission to scale intelligence to serve humanity, focusing on training and deploying frontier models for AI systems. As a Senior Software Engineer in MLOps and Infrastructure, you'll join a team responsible for building critical infrastructure that underpins Cohere's success. The role demands expertise in designing and managing large-scale distributed systems, particularly with Kubernetes and GPU workloads. You'll work with cutting-edge cloud technologies across GCP, Azure, AWS, and OCI, while building automated systems for deployment and operations.

The position requires participation in a 24x7 on-call rotation (compensated) and targets candidates based in EMEA. You'll be instrumental in developing self-service systems, custom Kubernetes operators, and ensuring robust observability and resilience. The ideal candidate brings 5+ years of production infrastructure experience, strong troubleshooting skills, and a collaborative mindset.

Cohere offers an inclusive work environment with offices in Toronto, New York, San Francisco, and London, providing flexibility with remote work options. The company values diversity and offers comprehensive benefits including health and dental coverage, mental health support, parental leave, and generous vacation time. This is an opportunity to work with world-class professionals while shaping the future of AI infrastructure.

If you're passionate about building systems that enhance developer productivity, have experience with Go programming, and enjoy contributing to open-source solutions, this role offers the chance to make a significant impact in the AI industry while working with cutting-edge technology.

Last updated 21 hours ago

Responsibilities For Senior Software Engineer, MLOps and Infrastructure

  • Build self-service systems that automate managing, deploying and operating services
  • Build custom Kubernetes operators that support language model deployments
  • Automate environment observability and resilience
  • Ensure defined SLOs are met, including participation in 24x7 on-call rotation
  • Build strong relationships with internal developers and influence Infrastructure team's roadmap
  • Develop team through knowledge sharing and active review process

Requirements For Senior Software Engineer, MLOps and Infrastructure

Go
Kubernetes
Linux
  • 5+ years of engineering experience running production infrastructure at large scale
  • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads
  • Experience working with GCP, Azure, AWS and/or OCI
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Excellent collaboration and troubleshooting skills
  • The grit and adaptability to solve complex technical challenges

Benefits For Senior Software Engineer, MLOps and Infrastructure

Dental Insurance
Medical Insurance
Mental Health Assistance
Parental Leave
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits
  • Mental health budget
  • 100% Parental Leave top-up for 6 months
  • Personal enrichment benefits
  • Remote-flexible work
  • Co-working stipend
  • 6 weeks of vacation

Interested in this job?

Jobs Related To Cohere Senior Software Engineer, MLOps and Infrastructure

Senior Software Engineer, MLOps and Infrastructure

Senior Software Engineer position at Cohere, focusing on MLOps and Infrastructure, building and maintaining critical systems for AI model deployment and operations.

SiteOps Global Systems Engineer

Senior Systems Engineer role at Meta focusing on data center operations, automation, and infrastructure optimization with competitive compensation and benefits.

Production Engineering

Senior Production Engineering role at Meta focusing on infrastructure, system reliability, and scalability for billions of users worldwide.

Senior Software Engineer (Infrastructure)

Senior Infrastructure Engineer role at Owl.co, building scalable AWS solutions for AI-powered insurance claims platform, offering competitive benefits and salary.

Experienced Infrastructure Engineer

Senior Infrastructure Engineer role at Stytch, building and scaling secure authentication infrastructure for developers