Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses.
$170,000 - $240,000
Backend
Senior Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Description For Production Systems Engineer, AI Systems

Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on Meta Training and Inference Accelerator (MTIA) program as a part of the AI/ML initiatives supporting large scale AI Training and Inference. The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, debugging, and stress testing. We are looking for a candidate to work on scale up and scale out network technologies for MTIA systems powering Meta's AI advancements.

Responsibilities:

  • Support new MTIA platform introduction
  • Create experiments and tooling for hardware/firmware/software health issues
  • Develop understanding of AI workload traffic
  • Contribute to enabling hacks for future AI technology explorations
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive continuous product quality improvement

Requirements:

  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in relevant domains
  • Knowledge of server architecture and components
  • Experience with Linux, TCP/IP, and iperf
  • Hands-on troubleshooting and debug experience

Preferred Qualifications:

  • Experience with Network Interface Cards (NICs)
  • Experience with RDMA/RoCE
  • Experience with full server systems, including PCIe
  • Experience with large scale deployments

Join Meta to shape the future of social technology beyond 2D screens, pushing the boundaries of augmented and virtual reality.

Last updated a month ago

Responsibilities For Production Systems Engineer, AI Systems

  • Support new MTIA platform introduction
  • Create experiments and tooling for hardware/firmware/software health issues
  • Develop understanding of AI workload traffic
  • Contribute to enabling hacks for future AI technology explorations
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux
  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in relevant domains
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Knowledge of TCP/IP and experience using iperf
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
  • bonus
  • equity
  • benefits

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Software Engineer

Senior Software Engineer position at Meta, working on large-scale distributed systems and backend infrastructure with competitive compensation and benefits.

Performance and Capacity Engineer

Senior Performance and Capacity Engineer role at Meta focusing on infrastructure scaling and performance optimization.

Software Engineer

Meta is hiring a Senior Software Engineer in Bellevue, WA to work on large-scale infrastructure applications and build new features for their suite of products.

ASIC Engineer, Design Verification

ASIC Design Verification Engineer at Meta, developing innovative solutions for data center applications.

Software Engineer, Systems

Meta is hiring a Software Engineer, Systems to build next-gen systems for Facebook's products, creating web apps for millions and designing core backend components.