Production Systems Engineer, AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses through social platforms like Facebook, Instagram, WhatsApp, and immersive AR/VR experiences.
$132,000 - $191,000
DevOps
Senior Software Engineer
In-Person
5,000+ Employees
4+ years of experience
AI

Description For Production Systems Engineer, AI Systems

Meta is seeking an experienced Production Systems Engineer to join their Release to Production (RTP) team, focusing on AI/ML initiatives. This role is central to Meta's AI infrastructure, working with cutting-edge hardware and software systems that power their AI capabilities.

The position involves managing the end-to-end Hardware Lifecycle of Meta's servers, including prototyping experimental hardware, conducting pre-production debugging, and implementing automated system monitoring. The role requires expertise in network technologies, including NICs, Switches, Optics, and various protocols, with a focus on supporting Meta's AI systems at scale.

As a Production Systems Engineer, you'll work closely with cross-functional teams, including hardware designers, networking teams, and system manufacturers. You'll be responsible for driving the integration of new AI platforms, creating diagnostic tools, and developing solutions for hardware health issues. The role combines hands-on technical work with strategic system planning and optimization.

The ideal candidate should have strong experience with Linux systems, network technologies, and troubleshooting complex systems. Knowledge of AI workload requirements and experience with large-scale deployments is highly valuable. The position offers competitive compensation ($132,000-$191,000/year) plus bonus and equity, along with comprehensive benefits.

This is an excellent opportunity for someone passionate about infrastructure and AI systems to work at the forefront of technology, helping to build and maintain the systems that power Meta's AI initiatives. The role offers exposure to cutting-edge technology and the chance to work on systems at a massive scale, making a direct impact on Meta's AI infrastructure.

Last updated 5 hours ago

Responsibilities For Production Systems Engineer, AI Systems

  • Support new AI platform introduction into Meta fleet by driving scale up and scale out interface integration
  • Create experiments and tooling to detect and diagnose hardware/firmware/software health issues
  • Develop understanding of AI workload traffic and incorporate as part of NPI
  • Enable hacks for future technology explorations in AI space
  • Troubleshoot, diagnose and root cause system failures
  • Develop visibility through data visualization
  • Implement systemic solutions to hardware health issues
  • Drive continuous product quality improvement

Requirements For Production Systems Engineer, AI Systems

Linux
Python
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 4+ years of work experience in network ASIC/Platform development, network product deployment, or Interconnect Technologies
  • Knowledge of server architecture and components
  • Experience working with Linux
  • Knowledge of TCP/IP and experience using iperf
  • Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
  • bonus
  • equity
  • benefits

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

Developer Advocate

Meta is seeking a Developer Advocate for the Horizon World platform to help third-party developers adopt new development frameworks and create extraordinary experiences.

Production Engineer

Senior Production Engineer role at Meta, combining software development and systems engineering to maintain and scale global infrastructure serving billions of users.

Data Center Production Engineer

Senior Data Center Production Engineer role at Meta, focusing on technical leadership and infrastructure optimization with competitive compensation and benefits.

Onsite Data Center Design Mechanical Engineer

Senior Mechanical Engineer role at Meta focusing on data center design and construction, requiring 10+ years of experience and professional engineering license.

Enterprise Systems Engineer

Senior Enterprise Systems Engineer role at Meta, building and maintaining infrastructure for Reality Labs Research, focusing on Linux environments and modern DevOps practices.