Meta is seeking a Production Systems Engineer to join their Release to Production (RTP) team, focusing on Fleet AI Systems. This role sits at the intersection of hardware infrastructure and AI systems, requiring a blend of technical expertise in both domains.
The position involves working with Meta's extensive server infrastructure and data centers, which form the backbone of their rapidly scaling operations. As a Production Systems Engineer, you'll be responsible for the complete hardware lifecycle of Meta's servers, including critical pre-production testing, system debugging, and implementing monitoring solutions.
The role requires collaboration with multiple teams, including hardware designers, system manufacturers, component vendors, and various internal engineering teams. You'll be instrumental in ensuring systems are thoroughly tested before deployment to production data centers and maintaining the health and lifecycle of servers in production environments.
Key responsibilities include developing and executing test suites for various architectures, creating automated testing frameworks, and implementing system monitoring solutions. You'll also be involved in troubleshooting complex hardware and software issues, developing data visualization tools, and establishing best practices for hardware infrastructure management at scale.
The ideal candidate should have extensive experience (7+ years) in hardware systems technologies or supporting production hardware at scale, strong Linux expertise, and a proven track record in server system architecture. Experience with AI/HPC systems at scale is particularly valuable for this role.
Meta offers a competitive compensation package ranging from $163,000 to $225,000 per year, plus bonus and equity opportunities. The position is based in Menlo Park, CA, at Meta's headquarters, where you'll work with cutting-edge technology and contribute to the infrastructure supporting billions of users worldwide.
This role offers an exciting opportunity to work at the forefront of technology infrastructure, combining hardware expertise with AI systems knowledge while contributing to Meta's mission of connecting people and building the future of social technology. The position provides exposure to large-scale systems and the chance to solve complex technical challenges in a dynamic, fast-paced environment.