Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses.
$170,000 - $240,000
Backend
Senior Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Description For Production Systems Engineer, AI Systems

Meta is seeking a Systems Engineer to join our Release to Production (RTP) team working on Meta Training and Inference Accelerator (MTIA) program as a part of the AI/ML initiatives supporting large scale AI Training and Inference. The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, debugging, and stress testing. We are looking for a candidate to work on scale up and scale out network technologies for MTIA systems powering Meta's AI advancements.

Responsibilities:

  • Support new MTIA platform introduction
  • Create experiments and tooling for hardware/firmware/software health issues
  • Develop understanding of AI workload traffic
  • Contribute to enabling hacks for future AI technology explorations
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive continuous product quality improvement

Requirements:

  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in relevant domains
  • Knowledge of server architecture and components
  • Experience with Linux, TCP/IP, and iperf
  • Hands-on troubleshooting and debug experience

Preferred Qualifications:

  • Experience with Network Interface Cards (NICs)
  • Experience with RDMA/RoCE
  • Experience with full server systems, including PCIe
  • Experience with large scale deployments

Join Meta to shape the future of social technology beyond 2D screens, pushing the boundaries of augmented and virtual reality.

Last updated 8 days ago

Responsibilities For Production Systems Engineer, AI Systems

  • Support new MTIA platform introduction
  • Create experiments and tooling for hardware/firmware/software health issues
  • Develop understanding of AI workload traffic
  • Contribute to enabling hacks for future AI technology explorations
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux
  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in relevant domains
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Knowledge of TCP/IP and experience using iperf
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
  • bonus
  • equity
  • benefits

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Software Development Engineer, AWS Audit Manager

Lead software development for AWS Audit Manager, streamlining audit and compliance processes with automation for global customers.

Senior System Development Engineer, Kuiper Network Services

Senior System Development Engineer role for Project Kuiper, building software to manage ground Points of Presence for satellite broadband connectivity.

Software Development Engineer-II

Amazon is seeking a Senior Software Engineer to lead high-scale web services development for digital content commerce.

Software Development Engineer, AWS Energy Team

Join AWS as a Software Development Engineer to build sustainable cloud infrastructure and innovative renewable energy solutions.

Software Development Engineer III, Transporter Aggregation and Tracking Services (Tracks)

Senior Software Engineer role at Amazon, focusing on Last Mile delivery technology and real-time data processing for logistics optimization.