Production Systems Engineer, Fleet AI Systems

Meta builds technologies that help people connect, find communities, and grow businesses through social platforms and immersive experiences.
$163,000 - $225,000
DevOps
Staff Software Engineer
In-Person
5,000+ Employees
7+ years of experience
AI · Enterprise SaaS

Description For Production Systems Engineer, Fleet AI Systems

Meta is seeking a Production Systems Engineer to join their Release to Production (RTP) team, focusing on Fleet AI Systems. This role sits at the intersection of hardware infrastructure and AI systems, requiring a blend of technical expertise in both domains.

The position involves working with Meta's extensive server infrastructure and data centers, which form the backbone of their rapidly scaling operations. As a Production Systems Engineer, you'll be responsible for the complete hardware lifecycle of Meta's servers, including critical pre-production testing, system debugging, and implementing monitoring solutions.

The role requires collaboration with multiple teams, including hardware designers, system manufacturers, component vendors, and various internal engineering teams. You'll be instrumental in ensuring systems are thoroughly tested before deployment to production data centers and maintaining the health and lifecycle of servers in production environments.

Key responsibilities include developing and executing test suites for various architectures, creating automated testing frameworks, and implementing system monitoring solutions. You'll also be involved in troubleshooting complex hardware and software issues, developing data visualization tools, and establishing best practices for hardware infrastructure management at scale.

The ideal candidate should have extensive experience (7+ years) in hardware systems technologies or supporting production hardware at scale, strong Linux expertise, and a proven track record in server system architecture. Experience with AI/HPC systems at scale is particularly valuable for this role.

Meta offers a competitive compensation package ranging from $163,000 to $225,000 per year, plus bonus and equity opportunities. The position is based in Menlo Park, CA, at Meta's headquarters, where you'll work with cutting-edge technology and contribute to the infrastructure supporting billions of users worldwide.

This role offers an exciting opportunity to work at the forefront of technology infrastructure, combining hardware expertise with AI systems knowledge while contributing to Meta's mission of connecting people and building the future of social technology. The position provides exposure to large-scale systems and the chance to solve complex technical challenges in a dynamic, fast-paced environment.

Last updated 5 hours ago

Responsibilities For Production Systems Engineer, Fleet AI Systems

  • Interface with external vendors and internal teams to develop and execute test suites for various architectures
  • Create experiments and tooling to detect and diagnose hardware/firmware/software health issues
  • Develop test framework for large-scale test automation
  • Implement remediations across software and hardware stack
  • Develop and publish updates on resolutions
  • Troubleshoot and diagnose system failures
  • Develop visibility through data visualization
  • Drive discussions on test specification and methodologies
  • Develop robust practices for supporting hardware infrastructure at scale

Requirements For Production Systems Engineer, Fleet AI Systems

Linux
  • 7+ years experience in hardware systems technologies or supporting production hardware at scale
  • 5+ years of experience troubleshooting and analytical experience in server system architecture
  • 3+ years of experience of using Linux and scripting
  • Current experience in changing system configurations and measuring change impact
  • 3+ years of experience working in a matrix organization
  • 3+ years of experience engineering innovations in server system/data center products
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

Benefits For Production Systems Engineer, Fleet AI Systems

Medical Insurance
Dental Insurance
Vision Insurance
  • bonus
  • equity
  • benefits

Interested in this job?

Jobs Related To Meta Production Systems Engineer, Fleet AI Systems

Operations Engineer

Operations Engineer role at Meta focusing on optimizing manufacturing and supply chain operations for AI infrastructure through data-driven solutions and process improvement.

Production Engineer

Meta is seeking an experienced Production Engineer to ensure smooth operation and growth of Meta's services, working on large-scale systems and infrastructure.

Senior Staff Operations Engineer

Senior Staff Operations Engineer position at Airbnb, focusing on observability architecture and automation within the BizTech department.

Field Support Engineer

Field Support Engineer position at Oracle in Pune, providing hardware support and maintenance for cloud infrastructure with 6+ years experience required.

AWS DevOps Engineer

Senior AWS DevOps Engineer role at Oracle focusing on cloud infrastructure, automation, and DevOps practices with competitive benefits and work-life balance.