Production Systems Engineer, AI Systems

Meta

Meta builds technologies that help people connect, find communities, and grow businesses through social technology and immersive experiences.

Austin, TX, USA

$163,000 - $225,000

Cloud

Senior Software Engineer

In-Person

5,000+ Employees

6+ years of experience

AI · Enterprise SaaS

Description For Production Systems Engineer, AI Systems

Meta is seeking a Systems Engineer for their Release to Production (RTP) team working on Meta Training and Inference Accelerator (MTIA) program, supporting large-scale AI Training and Inference. The role focuses on end-to-end Hardware Lifecycle management of Meta servers, including prototyping, debugging, and system monitoring. The position specifically emphasizes work on scale up and scale out network technologies for MTIA systems powering Meta's AI initiatives.

The ideal candidate will work closely with cross-functional teams, including hardware designers, networking teams, system manufacturers, and data center operations. They will be responsible for validating network interfaces at protocol/system level and managing system validation through to mass production. The role requires strong expertise in network protocols (TCP/IP, RDMA) and hands-on experience with post-Silicon validation.

This is an exciting opportunity to join Meta's AI/ML initiatives and contribute to the infrastructure that powers their innovative services. The position offers competitive compensation ranging from $163,000 to $225,000 annually, plus bonus, equity, and benefits. The role is based in Austin, with a focus on hands-on system work and collaboration with various internal and external partners.

The successful candidate will have the opportunity to work on cutting-edge AI infrastructure, develop solutions for hardware health issues, and drive continuous improvement in product quality. This role is perfect for someone with a strong background in networking technologies and system engineering who wants to make an impact on Meta's AI infrastructure development.

Last updated a month ago

Responsibilities For Production Systems Engineer, AI Systems

Support new MTIA platform introduction into Meta fleet by working with post-silicon validation team
Create experiments and tooling to detect, reproduce and diagnose hardware/firmware/software health issues
Develop understanding of AI workload traffic and incorporate as part of NPI
Contribute to enabling hacks for future technology explorations in AI space
Troubleshoot, diagnose and root cause system failures
Develop visibility through data visualization and implement systemic solutions
Drive external and internal teams to continuously improve product quality

Requirements For Production Systems Engineer, AI Systems

Linux

Bachelor's degree in Engineering or Computer Science
6+ years of work experience in Network ASIC development, Network Product deployment, or Interconnect Technologies
Knowledge of server architecture and components
Experience working with Linux
Knowledge of TCP/IP and experience using iperf
Hands on troubleshooting and debug experience

Benefits For Production Systems Engineer, AI Systems

bonus
equity
benefits

Meta

Meta builds technologies that help people connect, find communities, and grow businesses through social technology and immersive experiences.

Austin, TX, USA

$163,000 - $225,000

Cloud

Senior Software Engineer

In-Person

5,000+ Employees

6+ years of experience

AI · Enterprise SaaS

Meta

How can you modify a list in-place to move even numbers to the front and odd numbers to the end efficiently, without preserving relative order?

Data Structures & AlgorithmsHard

You are given a list of integers. Modify the list in-place such that all even numbers appear at the beginning of the list and all odd numbers appear at the end. The relative order of even numbers and odd numbers does not matter. For example: If the input list is [3, 1, 2, 4, 6, 7], a valid output would be [2, 4, 6, 3, 1, 7]. Another valid output would be [4, 2, 6, 1, 3, 7]. The even numbers (2, 4, 6) are at the beginning, and the odd numbers (3, 1, 7) are at the end. If the input list is [1, 3, 5, 7], the output should be [1, 3, 5, 7] since all numbers are already odd. If the input list is [2, 4, 6, 8], the output should be [2, 4, 6, 8] since all numbers are already even. If the input list is [], the output should be []. Write a function that takes a list of integers as input and modifies it in-place to satisfy these conditions. Your solution should be efficient, ideally completing the modification in O(n) time complexity and O(1) space complexity.

Arrays

Two Pointers

Meta

How would you design a duplicate file finder, discussing complexity, optimizations, and trade-offs?

System DesignHard

Let's design a system utility to find duplicate files on a given file system. The utility should traverse a directory, compute a hash for each file, and identify duplicates based on these hashes. Discuss the algorithmic complexity, potential optimizations, and trade-offs between CPU and RAM usage. What is the theoretical best performance we can achieve?

Arrays

Strings

Greedy Algorithms

Dynamic Programming

Graphs

Trees

Meta

Tell me about a time you disagreed with a manager.

Behavioral

Tell me about a time you disagreed with a manager. What was the situation, and how did you handle it? What was the outcome, and what did you learn from the experience? For example, perhaps you had a disagreement about project priorities, a technical approach, or a team management style. It could be a situation where you felt strongly about a different course of action and had to navigate the disagreement professionally. Describe the specific steps you took to communicate your perspective, understand your manager's viewpoint, and reach a resolution. What factors did you consider in deciding how to approach the situation, and what alternative approaches did you consider? How did you ensure that the disagreement didn't negatively impact your working relationship or the overall team dynamic? What would you do differently in a similar situation in the future?

Interested in this job?

Jobs Related To Meta Production Systems Engineer, AI Systems

SiteOps Data Center Capacity Engineer

Meta

Senior Data Center Capacity Engineer role at Meta, managing infrastructure growth and capacity planning for global data centers.

Network Engineer, Deployment & Support

Meta

Senior Network Engineer position at Meta focusing on deployment and support of global network infrastructure, offering competitive compensation and opportunities to work with cutting-edge technologies.

Production Systems Engineer, AI Systems

Meta

Senior Systems Engineer role at Meta focusing on AI infrastructure, network technologies, and hardware lifecycle management for large-scale AI systems.

Network Engineer, Deployment & Support

Meta

Senior Network Engineer position at Meta focusing on deployment and support of datacenter infrastructure and network operations.

Senior Software Developer, Google Cloud AI

Google

Senior Software Developer position at Google Cloud AI, focusing on developing enterprise-grade solutions and next-generation technologies that serve billions of users worldwide.