Taro Logo

Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses.
$170,000 - $240,000
Backend
Senior Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:

Description For Production Systems Engineer, AI Systems

Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on Meta Training and Inference Accelerator (MTIA) program as a part of the AI/ML initiatives supporting large scale AI Training and Inference. The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, debugging, and stress testing. We are looking for a candidate to work on scale up and scale out network technologies for MTIA systems powering Meta's AI advancements.

Responsibilities:

  • Support new MTIA platform introduction
  • Create experiments and tooling for hardware/firmware/software health issues
  • Develop understanding of AI workload traffic
  • Contribute to enabling hacks for future AI technology explorations
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive continuous product quality improvement

Requirements:

  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in relevant domains
  • Knowledge of server architecture and components
  • Experience with Linux, TCP/IP, and iperf
  • Hands-on troubleshooting and debug experience

Preferred Qualifications:

  • Experience with Network Interface Cards (NICs)
  • Experience with RDMA/RoCE
  • Experience with full server systems, including PCIe
  • Experience with large scale deployments

Join Meta to shape the future of social technology beyond 2D screens, pushing the boundaries of augmented and virtual reality.

Last updated 8 months ago

Responsibilities For Production Systems Engineer, AI Systems

  • Support new MTIA platform introduction
  • Create experiments and tooling for hardware/firmware/software health issues
  • Develop understanding of AI workload traffic
  • Contribute to enabling hacks for future AI technology explorations
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux
  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in relevant domains
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Knowledge of TCP/IP and experience using iperf
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
  • bonus
  • equity
  • benefits

Interested in this job?