AWS Neuron is seeking a Senior Software Engineer to join their Machine Learning Applications (ML Apps) team. This role focuses on developing and optimizing AWS Inferentia and Trainium cloud-scale machine learning accelerators. The position involves working with cutting-edge ML models, including GPT2, GPT3, stable diffusion, and Vision Transformers. You'll collaborate with chip architects and engineers to create distributed training solutions using Trn1, while implementing FSDP, Deepspeed, and other distributed training libraries.
The role combines deep technical expertise in both software development and machine learning, requiring proficiency in C++/Python and extensive knowledge of ML model architectures. You'll be responsible for building distributed inference support in frameworks like PyTorch and TensorFlow, while ensuring optimal performance on AWS Trainium and Inferentia silicon.
Amazon offers a competitive compensation package ranging from $151,300 to $261,500 based on location, plus equity and comprehensive benefits. The team values work-life balance and provides a supportive environment focused on mentorship and professional growth. You'll join a diverse team that celebrates knowledge sharing and maintains high standards through thorough, constructive code reviews.
This position offers an opportunity to work at the forefront of machine learning infrastructure, developing solutions that will power the next generation of AI applications. The role combines technical leadership with hands-on development, making it ideal for experienced engineers passionate about machine learning and distributed systems.