AWS Neuron is seeking a Software Engineer to join their Machine Learning Applications team, focusing on distributed training solutions. This role is part of Annapurna Labs, acquired by AWS in 2015, which serves as the infrastructure provider for AWS. The position involves working on AWS Neuron, the complete software stack for AWS Inferentia and Trainium cloud-scale machine learning accelerators.
The role requires expertise in developing and optimizing distributed training support for major ML frameworks like PyTorch and TensorFlow. You'll work closely with chip architects and compiler engineers to create efficient solutions for Trn1 systems. The position involves performance tuning of various ML models, including large language models like GPT2/GPT3 and stable diffusion.
AWS offers a collaborative environment with strong emphasis on work-life balance and professional growth. The team values knowledge sharing and mentorship, providing opportunities to work on complex projects that impact millions of users. The company provides comprehensive benefits and promotes an inclusive culture through various employee-led affinity groups.
This is an excellent opportunity for engineers passionate about machine learning infrastructure who want to work at the intersection of hardware and software optimization. You'll be part of a team that's pushing the boundaries of ML acceleration and distributed computing, while enjoying the stability and resources of one of the world's leading tech companies.