Senior Software Engineer - Ceph

A startup building large language tools, founded by Alex Smola and Mu Li, working on high-quality generative AI models for language, audio, and entertainment.
$150,000 - $250,000
DevOps
Senior Software Engineer
Hybrid
5+ years of experience
AI

Description For Senior Software Engineer - Ceph

Boson AI, an innovative startup in the AI space, is seeking a Senior Software Engineer with deep expertise in Ceph management for their deep learning datacenter in Toronto. Founded by renowned experts Alex Smola and Mu Li, the company is at the forefront of developing generative AI models for language, audio, and entertainment.

The role offers an exciting opportunity to work with cutting-edge technology, including NVIDIA H100 and A100 GPUs, managing over 25PB of disk and 5PB flash storage, Terabit networking, and hundreds of computers. The position requires strong problem-solving skills and the ability to learn new tools quickly.

As a Senior Software Engineer, you'll be responsible for deploying and operating Ceph and its integration with various infrastructure technologies and hardware systems. The role involves working with advanced technologies like Slurm, MAAS, Infiniband, NVIDIA deepops, and Layer 3 networking. Hardware configuration experience is necessary.

The ideal candidate must have prior Ceph experience (this is a strict requirement). You'll be working in a hybrid environment with access to state-of-the-art infrastructure. The compensation range of $150,000 - $250,000 reflects the senior nature of the role and the expertise required.

This is an excellent opportunity for a seasoned DevOps engineer who wants to work at the intersection of infrastructure and AI, managing critical storage systems that power cutting-edge AI research and development. The role offers the chance to work with the latest technology stack and contribute to the advancement of AI infrastructure.

Last updated a day ago

Responsibilities For Senior Software Engineer - Ceph

  • Design, manage and maintain large storage arrays
  • Integrate them with Deep Learning infrastructure
  • Support troubleshooting for MAAS, Slurm and Kubernetes as needed
  • Configure and automate on-premises Linux-based systems at scale using infrastructure-as-code practices
  • Learn about new tools and deploy them

Requirements For Senior Software Engineer - Ceph

Linux
Python
Kubernetes
  • Strong background in maintaining Ceph clusters
  • Experience with high performance computing
  • Experience with on-premises Data Center operations and technologies
  • Experience in managing a large hardware cluster
  • Proficiency in at least one programming language (e.g. Python) and ability to write clean, maintainable code
  • Experience with managing firmware / systems updates for systems, e.g. on SuperMicro

Interested in this job?

Jobs Related To Boson AI Senior Software Engineer - Ceph

Senior DevOps Engineer

Senior DevOps Engineer role at Complex, leading infrastructure and automation initiatives for a global youth entertainment network.

Senior DevOps Engineer - Developer Experience

Senior DevOps Engineer position at Matillion, focusing on developer experience and tooling optimization, offering hybrid work in Hyderabad with comprehensive benefits.

Senior Software Engineer - System Release, DevOps

Senior DevOps Engineer role at Capital One focusing on cloud infrastructure, AWS, and system release engineering, offering competitive compensation and comprehensive benefits.

Senior Platform Engineer

Senior Platform Engineer role at Capital One focusing on DevOps, AWS, and Windows platform engineering with competitive compensation and benefits.

Senior Software Engineer, DevOps (Principal Associate)

Senior DevOps Engineer role at Capital One focusing on cloud technologies, automation, and infrastructure as code, offering competitive compensation and comprehensive benefits.