Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses through social technology and immersive experiences.
$163,000 - $225,000
Cloud
Senior Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Description For Production Systems Engineer, AI Systems

Meta is seeking a Systems Engineer for their Release to Production (RTP) team working on Meta Training and Inference Accelerator (MTIA) program, supporting large-scale AI Training and Inference. The role focuses on end-to-end Hardware Lifecycle management of Meta servers, including prototyping, debugging, and system monitoring. The position specifically emphasizes work on scale up and scale out network technologies for MTIA systems powering Meta's AI initiatives.

The ideal candidate will work closely with cross-functional teams, including hardware designers, networking teams, system manufacturers, and data center operations. They will be responsible for validating network interfaces at protocol/system level and managing system validation through to mass production. The role requires strong expertise in network protocols (TCP/IP, RDMA) and hands-on experience with post-Silicon validation.

This is an exciting opportunity to join Meta's AI/ML initiatives and contribute to the infrastructure that powers their innovative services. The position offers competitive compensation ranging from $163,000 to $225,000 annually, plus bonus, equity, and benefits. The role is based in Austin, with a focus on hands-on system work and collaboration with various internal and external partners.

The successful candidate will have the opportunity to work on cutting-edge AI infrastructure, develop solutions for hardware health issues, and drive continuous improvement in product quality. This role is perfect for someone with a strong background in networking technologies and system engineering who wants to make an impact on Meta's AI infrastructure development.

Last updated 17 hours ago

Responsibilities For Production Systems Engineer, AI Systems

  • Support new MTIA platform introduction into Meta fleet by working with post-silicon validation team
  • Create experiments and tooling to detect, reproduce and diagnose hardware/firmware/software health issues
  • Develop understanding of AI workload traffic and incorporate as part of NPI
  • Contribute to enabling hacks for future technology explorations in AI space
  • Troubleshoot, diagnose and root cause system failures
  • Develop visibility through data visualization and implement systemic solutions
  • Drive external and internal teams to continuously improve product quality

Requirements For Production Systems Engineer, AI Systems

Linux
  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in Network ASIC development, Network Product deployment, or Interconnect Technologies
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Knowledge of TCP/IP and experience using iperf
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

  • bonus
  • equity
  • benefits

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Network Engineer, Deployment & Support

Senior Network Engineer position at Meta focusing on deployment and support of global network infrastructure, offering competitive compensation and opportunities to work with cutting-edge technologies.

Production Systems Engineer, AI Systems

Senior Systems Engineer role at Meta focusing on AI infrastructure, network technologies, and hardware lifecycle management for large-scale AI systems.

Network Engineer, Deployment & Support

Senior Network Engineer position at Meta focusing on deployment and support of datacenter infrastructure and network operations.

Production Network Engineer, Physical Infrastructure

Senior Network Engineer role at Meta focusing on datacenter physical infrastructure, networking protocols, and innovative solutions for large-scale operations.

Embedded Escalation Engineer – SQL VM

Senior Cloud Engineer role at Microsoft focusing on SQL VM support and engineering, offering hybrid work and comprehensive benefits.