AWS Utility Computing (UC) is seeking a talented Machine Learning Engineer to join their Distributed Training team for AWS Neuron. This role sits at the intersection of cutting-edge AI technology and cloud computing, working with AWS's custom silicon solutions - Inferentia and Trainium. You'll be responsible for developing and optimizing distributed training solutions for large-scale ML models, including LLMs like GPT and Llama, as well as Stable Diffusion and Vision Transformers.
The position offers a unique opportunity to work directly with chip architects, compiler engineers, and runtime engineers, creating solutions that push the boundaries of what's possible in machine learning. You'll be part of AWS's innovative culture, working on products that continue to set AWS's services apart in the industry.
The team culture emphasizes knowledge-sharing, mentorship, and inclusive practices. AWS values diverse experiences and backgrounds, offering various employee-led affinity groups and ongoing learning experiences. The company provides comprehensive benefits, emphasizes work-life harmony, and offers competitive compensation ranging from $129,300 to $223,600 based on location and experience.
This role requires strong software development skills combined with deep machine learning knowledge. You'll work with technologies like FSDP, Deepspeed, PyTorch, and TensorFlow, while having the opportunity to contribute to Amazon's growing suite of generative AI services. The position offers excellent career growth opportunities, with senior team members providing one-on-one mentoring and thorough code reviews.
If you're passionate about machine learning, have strong software development skills, and want to work on technology that helps customers solve previously unimaginable challenges, this role offers an exciting opportunity to make a significant impact in the field of AI and cloud computing.