AWS Utility Computing (UC) is at the forefront of cloud innovation, specifically within the Annapurna Labs division. This role focuses on the AWS Neuron software stack for AWS Inferentia and Trainium, our cloud-scale Machine Learning accelerators. As a machine learning engineer in the Distribute Training team, you'll be responsible for developing and optimizing various ML models, including Large Language Models like GPT and Llama, as well as Stable Diffusion and Vision Transformers.
The position requires expertise in distributed training libraries such as FSDP and Deepspeed, working directly with custom AWS silicon. You'll collaborate with a diverse team of chip architects and engineers to build and enhance distributed training solutions. The role combines deep machine learning knowledge with strong software development skills.
AWS values diverse experiences and maintains an inclusive culture that celebrates knowledge-sharing and mentorship. The team supports professional growth through one-on-one mentoring and constructive code reviews. AWS pioneered cloud computing and continues to innovate, serving customers from startups to Global 500 companies.
The company emphasizes work-life harmony and provides comprehensive benefits including medical, financial, and equity compensation. Employee-led affinity groups and ongoing learning experiences, including Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, foster an inclusive environment where differences are celebrated.
This is an opportunity to work on cutting-edge ML infrastructure at scale, with competitive compensation ranging from $129,300 to $223,600 based on location and experience. Join a team that's pushing the boundaries of what's possible in cloud computing and machine learning acceleration.