Meta is seeking software engineers to enhance our AI inference infrastructure. As a team member, you'll play a crucial role in improving the latency and power consumption of our AI models and building user-facing APIs for ML engineers. This position requires expertise in both machine learning and software engineering.
Responsibilities include:
- Fine-tuning, quantizing, and deploying ML models on-device across phones, AR, and VR devices
- Optimizing models for latency and power consumption
- Enabling efficient inference on GPUs
- Building tooling to develop and deploy efficient models for inference
- Partnering with teams across Meta Reality Labs to optimize key inference workloads
Minimum Qualifications:
- PhD in Computer Science, Computer Engineering, or equivalent (completed or in progress)
- Specialized experience in model quantization, compression, on-device inference, GPU inference, PyTorch
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- Must obtain and maintain work authorization in the country of employment
Preferred Qualifications:
- Proven record of training, fine-tuning, and optimizing models
- 3+ years of experience accelerating deep learning models for on-device inference
- Experience optimizing machine learning model inference on NVIDIA GPUs
- Familiarity with on-device inference platforms (ARM, Qualcomm DSP)
- Experience with CUDA/Triton
Meta offers a competitive compensation package, including benefits, and is committed to providing reasonable accommodations for candidates with disabilities or other needs.