Software Engineer, Systems ML - Collective Compute Enablement

Meta builds technologies that help people connect, find communities, and grow businesses through social technology and immersive experiences.
Oslo, Norway
Machine Learning
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI

Description For Software Engineer, Systems ML - Collective Compute Enablement

Meta is seeking a Senior Software Engineer for their Collective Compute Enablement team, focusing on maximizing training performance of Generative AI and Recommendation models on Meta's Training and Inference Accelerator (MTIA). This role combines software engineering expertise with machine learning systems optimization, requiring deep technical knowledge in AI infrastructure and hardware acceleration.

The position involves working with cutting-edge technology to optimize and scale AI training workloads, particularly for Large Language Models (LLMs) and Deep Learning Recommendation Models (DLRMs). You'll be at the forefront of AI infrastructure development, working on next-generation training superclusters and collaborating with various teams to improve end-to-end performance of large-scale training systems.

As a senior member of the team, you'll have the opportunity to influence architectural decisions, mentor other engineers, and lead complex technical initiatives. The role requires expertise in C++, Python, and AI frameworks, with a focus on performance optimization and distributed systems. You'll be working with Meta's proprietary hardware accelerator (MTIA) and contributing to the development of next-generation AI experiences.

The ideal candidate should have a strong background in computer science or related STEM field, with specialized experience in AI infrastructure, machine learning systems, or hardware acceleration. Experience with distributed AI systems, communication protocols, and large-scale model training is highly valued. This role offers the chance to work on some of the most challenging problems in AI infrastructure while contributing to Meta's mission of connecting people and building immersive experiences.

Working at Meta provides the opportunity to impact billions of users while pushing the boundaries of what's possible in AI and social technology. The company offers a collaborative environment where you'll work with talented engineers and researchers, access to cutting-edge technology, and the chance to shape the future of digital communication and immersive experiences.

Last updated 35 minutes ago

Responsibilities For Software Engineer, Systems ML - Collective Compute Enablement

  • Apply state-of-the-art AI infrastructure and software/hardware acceleration techniques to build and optimize large-scale AI workloads
  • Analyze, benchmark, and optimize large-scale workloads on next-generation training superclusters
  • Define use cases and develop methodology and benchmarks to evaluate different approaches
  • Set direction and goals for the team related to project impact, AI system design, infrastructure, and developer efficiency
  • Lead large and complex technical efforts across many engineers and teams
  • Influence and impact next-generation of model and hardware architecture choices based on thorough data-driven analyses
  • Help onboard new team members, provide mentorship, and enable successful ramp up on the team's code base
  • Mentor other engineers, research scientists and improve the quality of engineering work in the broader team

Requirements For Software Engineer, Systems ML - Collective Compute Enablement

Python
  • Bachelor's degree in computer science or a related STEM field
  • Specialized experience in machine learning/AI domains: hardware accelerator architectures, machine learning compilers or ML systems, AI infrastructure, high-performance computing
  • Proven C/C++ and Python programming skills in developing AI Systems infrastructure or AI algorithms
  • Experience with debugging in C++, Python and/or PyTorch
  • Track record of defining and leading long-term plans for the team
  • Track record of mentoring and growing other engineers
  • Must obtain work authorization in the country of employment
  • Technical leadership experience

Interested in this job?

Jobs Related To Meta Software Engineer, Systems ML - Collective Compute Enablement

Software Engineer, Machine Learning

Senior Machine Learning Engineer role at Meta, developing scalable ML solutions and leading technical initiatives in social technology, offering competitive compensation and growth opportunities.

Software Engineer, Systems ML - Frameworks / Compilers / Kernels

Senior Software Engineer role at Meta focusing on AI frameworks, compilers, and kernels for machine learning systems.

Software Engineer, Machine Learning

Senior Machine Learning Engineer position at Meta, focusing on developing scalable ML solutions and leading technical teams in Sunnyvale, CA.

Software Engineer, Systems ML - SW/HW Co-design

Senior Software Engineer position at Meta focusing on Systems ML and SW/HW Co-design, requiring expertise in AI infrastructure and hardware acceleration.

Software Engineer, Machine Learning

Senior Machine Learning Engineer role at Meta focusing on developing scalable ML solutions and leading technical initiatives in recommendation systems and AI applications.