Meta is seeking a Systems Engineer for their Release to Production (RTP) team working on Meta Training and Inference Accelerator (MTIA) program, supporting large-scale AI Training and Inference. The role focuses on end-to-end Hardware Lifecycle management of Meta servers, including prototyping, debugging, and system monitoring. The position specifically emphasizes work on scale up and scale out network technologies for MTIA systems powering Meta's AI initiatives.
The ideal candidate will work closely with cross-functional teams, including hardware designers, networking teams, system manufacturers, and data center operations. They will be responsible for validating network interfaces at protocol/system level and managing system validation through to mass production. The role requires strong expertise in network protocols (TCP/IP, RDMA) and hands-on experience with post-Silicon validation.
This is an exciting opportunity to join Meta's AI/ML initiatives and contribute to the infrastructure that powers their innovative services. The position offers competitive compensation ranging from $163,000 to $225,000 annually, plus bonus, equity, and benefits. The role is based in Austin, with a focus on hands-on system work and collaboration with various internal and external partners.
The successful candidate will have the opportunity to work on cutting-edge AI infrastructure, develop solutions for hardware health issues, and drive continuous improvement in product quality. This role is perfect for someone with a strong background in networking technologies and system engineering who wants to make an impact on Meta's AI infrastructure development.