Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses through social platforms and immersive experiences.
$163,000 - $225,000
Cloud
Senior Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS

Description For Production Systems Engineer, AI Systems

Meta is seeking a Systems Engineer for their Release to Production (RTP) team focusing on AI/ML initiatives supporting large-scale AI Training and Inference. The role involves managing the end-to-end Hardware Lifecycle of Meta servers, including prototyping, debugging, and stress testing. The position requires expertise in network technologies and hands-on experience with hardware/software lifecycle phases.

The RTP team plays a crucial role in Meta's infrastructure, working closely with various cross-functional partners including hardware designers, networking teams, system manufacturers, and data center operations teams. The ideal candidate will support scale up and scale out network technologies for Meta AI systems, bringing knowledge of network technologies (NICs, Switches, Optics, DACs, Protocols-TCP/IP, RDMA) and practical experience in system integration and validation.

This role offers an opportunity to work at the forefront of AI infrastructure, contributing to Meta's advancement in artificial intelligence. The position combines technical expertise with practical problem-solving, requiring both deep knowledge of network technologies and the ability to implement solutions at scale. The successful candidate will be part of Meta's mission to build the next evolution in social technology, working beyond traditional computing constraints.

The compensation package is competitive, ranging from $163,000 to $225,000 annually, plus bonus, equity, and comprehensive benefits. This is an excellent opportunity for experienced professionals looking to impact the future of AI infrastructure at one of the world's leading technology companies.

Last updated 6 days ago

Responsibilities For Production Systems Engineer, AI Systems

  • Lead integration of scale up and scale out interfaces for AI Platforms
  • Develop understanding of Collective Communication patterns/AI workloads
  • Create experiments and tooling to detect, reproduce and diagnose hardware/firmware/software issues
  • Contribute to enabling hacks for future technology explorations in AI space
  • Troubleshoot, diagnose and root cause system failures
  • Develop visibility through data visualization
  • Implement systemic solutions to hardware health issues
  • Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux
  • Bachelor's degree in Engineering or Computer Science
  • 6+ years of work experience in Network ASIC/Platform Development or Network Product Deployment
  • Knowledge of TCP/IP and experience using tools like iperf/uperf
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
401k
Equity
  • bonus
  • equity
  • benefits

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Network Engineer, Deployment & Support

Senior Network Engineer position at Meta focusing on deployment and support of datacenter infrastructure and network operations.

Production Network Engineer, Physical Infrastructure

Senior Network Engineer role at Meta focusing on datacenter physical infrastructure, networking protocols, and innovative solutions for large-scale operations.

System Development Engineer III, Kubernetes/Serverless

Senior System Development Engineer role at AWS focusing on Kubernetes and Serverless technologies for government cloud services.

Commissioning Engineer, Amazon Commissioning team

Senior Commissioning Engineer position at AWS Infrastructure Services, overseeing data center infrastructure commissioning and maintenance with global team collaboration.

Pre-Construction Manager, AMER Southeast ML Data Center Pre-Construction

Senior Pre-Construction Manager role at AWS managing data center construction programs, requiring 6+ years experience in critical facility management and construction.