Operations Engineer - HPC Networking Team

AI Hyperscaler delivering cloud platform services for accelerated computing, operating data centers across US and Europe since 2017.
$90,000 - $110,000
Cloud
Entry-Level Software Engineer
Hybrid
501 - 1,000 Employees
1+ year of experience
AI · Enterprise SaaS · Cloud

Description For Operations Engineer - HPC Networking Team

CoreWeave, ranked as one of TIME100's most influential companies of 2024, is revolutionizing the AI and cloud computing landscape. As an AI Hyperscaler™, we operate an extensive network of data centers across the US and Europe, delivering cutting-edge services for accelerated computing since 2017.

The Operations Engineer role in our HPC Networking Team presents an exciting opportunity to work with some of the largest InfiniBand fabrics powering industry-leading AI workloads. You'll be responsible for the deployment, monitoring, and maintenance of these critical infrastructure components, ensuring their optimal performance and reliability.

We're seeking candidates who thrive in dynamic environments and enjoy tackling complex technical challenges. The role combines hands-on technical work with collaborative problem-solving, offering exposure to cutting-edge technology in AI and high-performance computing. You'll work with state-of-the-art tools and technologies while contributing to the infrastructure that powers next-generation AI applications.

Our company culture emphasizes innovation, flexibility, and professional growth. We offer a comprehensive benefits package, including fully-paid health insurance, professional development opportunities, and a flexible hybrid work environment. This is an excellent opportunity for someone looking to make a significant impact in the rapidly evolving field of AI infrastructure while working with a team of passionate technologists.

Join CoreWeave and be part of a team that's shaping the future of AI and cloud computing, while enjoying competitive compensation, excellent benefits, and the chance to work on some of the most exciting challenges in the industry.

Last updated 23 days ago

Responsibilities For Operations Engineer - HPC Networking Team

  • Monitor performance and health of InfiniBand fabrics, switches, host adapters, and nodes
  • Investigate and resolve operational issues within InfiniBand fabrics
  • Assist with installation and operational bring-up of large InfiniBand fabrics
  • Perform routine maintenance and upgrades on InfiniBand switches
  • Collaborate with HPC cluster operations teams for troubleshooting

Requirements For Operations Engineer - HPC Networking Team

Linux
Python
  • At least 1 year of experience with InfiniBand or similar networking technologies
  • Solid understanding of networking concepts
  • Experience with Linux system administration
  • Proficiency in at least one scripting language
  • Knowledge of data center operations
  • Python or Bash scripting

Benefits For Operations Engineer - HPC Networking Team

Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
Parental Leave
401k
Education Budget
  • Medical, dental, and vision insurance - 100% paid
  • Company-paid Life Insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support
  • Paid Parental Leave
  • Flexible childcare support
  • 401(k) with employer match
  • Flexible PTO
  • Catered lunch
  • Casual work environment

Interested in this job?

Jobs Related To CoreWeave Operations Engineer - HPC Networking Team

Technical Support Engineering

Technical Support Engineer role at Microsoft focusing on Azure cloud services support, requiring 1+ years experience and strong cloud computing knowledge.

Cloud Support Engineer (Windows), Support Engineering

Entry-level Cloud Support Engineer position at AWS, focusing on Windows systems and cloud infrastructure support, requiring 6+ months of relevant experience.

Electrical Design Engineer, Data Center Design Engineering

Junior Electrical Engineer role at AWS designing and implementing electrical systems for data centers, offering competitive pay and benefits with up to 20% travel.

Software Development Engineer, Amazon Dedicated Cloud

Entry-level Software Development Engineer position at AWS working on cloud infrastructure and security solutions, requiring TS/SCI clearance.

Critical Environment Mechanical Engineer

Critical Environment Mechanical Engineer role at Microsoft managing datacenter infrastructure and mechanical systems.