Senior Software Engineer, Infrastructure

CentML develops AI infrastructure to reduce the cost of developing and deploying ML models, enabling widespread access to AI technology.
Cloud
Senior Software Engineer
Hybrid
4+ years of experience
AI · Enterprise SaaS

Description For Senior Software Engineer, Infrastructure

CentML is revolutionizing the AI infrastructure landscape with a mission to democratize AI by significantly reducing the costs associated with ML model development and deployment. The company is led by a distinguished team of experts from leading tech companies and is spearheaded by co-founder and CEO Gennady Pekhimenko, a renowned expert in ML systems.

As a Senior Software Engineer in Infrastructure, you will play a pivotal role in shaping the future of ML infrastructure. You'll be responsible for designing and developing the CentML platform's deployment infrastructure, which manages ML training and inference across multiple cloud providers including AWS, GCP, Azure, Coreweave, and OCI. This role combines deep technical expertise in containerization, cloud infrastructure, and GPU technologies with the leadership opportunity to guide a team of engineers.

The position offers an exciting opportunity to work on cutting-edge technology that directly impacts the accessibility of AI technology. You'll be working with state-of-the-art GPU clusters, implementing sophisticated scheduling solutions, and ensuring the platform's scalability and performance. The role requires a strong background in containerized deployment systems, cloud infrastructure, and programming languages like Python, Java, and Go.

Working at CentML means joining a company that values diversity, inclusion, and work-life balance. The company offers competitive benefits including equity options, comprehensive healthcare, and professional development opportunities. Whether you're based in Toronto or San Francisco, you'll be part of a team that's pushing the boundaries of what's possible in AI infrastructure.

Last updated an hour ago

Responsibilities For Senior Software Engineer, Infrastructure

  • Design and lead the development of the deployment infrastructure of the CentML platform
  • Implementing GPU cluster scheduling solutions for large scale ML training and inference workloads
  • Communicate with product teams and define new features and goals for improving the CentML platform

Requirements For Senior Software Engineer, Infrastructure

Python
Java
Go
Kubernetes
  • 4+ years of experience working with containerized deployment systems
  • Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure
  • Strong coding skills in languages like Python, Java, Go, and/or C/C++
  • Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable
  • Past experience in building GPU clusters for large scale ML training and inference is desirable

Benefits For Senior Software Engineer, Infrastructure

Equity
Medical Insurance
Dental Insurance
Parental Leave
Education Budget
  • An open and inclusive work environment
  • Employee stock options
  • Best-in-class medical and dental benefits
  • Parental Leave top-up
  • Professional development budget
  • Flexible vacation time

Interested in this job?

Jobs Related To CentML Senior Software Engineer, Infrastructure

UK - London - ADS - Senior Cloud Engineer

Senior Cloud Engineer position at Alpha Financial Markets Consulting in London

Senior Software Engineer - Infrastructure

Senior Software Engineer position at Veeva Systems focusing on cloud infrastructure and platform development for life sciences applications, offering remote work and competitive compensation.

Senior Software Engineer - GCP Integrations

Senior Software Engineer position at Datadog focusing on GCP integrations, offering competitive salary, equity, and comprehensive benefits in a hybrid work environment.

Senior Software Engineer - Cloud Infrastructure

Senior Software Engineer position at WeRide focusing on cloud infrastructure, developing scalable PaaS and IaaS platforms for autonomous driving technology.

Senior Software Engineer

Senior Software Engineer role at Shakudo, building the world's first operating system for data and AI, focusing on Kubernetes and systems development.