LinkedIn is seeking a Principal Staff Software Engineer to join their AI Platform group, specifically focusing on the AI Training team. This role is pivotal in developing and maintaining highly available and scalable deep learning training solutions that power LinkedIn's expanding AI initiatives. The team is responsible for scaling AI model training with hundreds of billions of parameters across recommendation systems, large language models (Generative AI), and computer vision models.
The position involves working with cutting-edge technologies and optimizing training performance across multiple dimensions: algorithms, AI frameworks, infrastructure software, and hardware. The team manages thousands of latest GPU cards and collaborates closely with the open source community, with many team members being active contributors to projects like TensorFlow, Horovod, Ray, and Hadoop.
As a Principal Staff Software Engineer, you'll lead the development of next-generation training infrastructure, focusing on high-performance AI training pipelines, data I/O optimization, and working with popular libraries like Huggingface, Horovod, and PyTorch. You'll be responsible for debugging and optimizing deep learning training, implementing advanced features like model parallelism, data parallelism, Zero, and automatic mixed precision.
The role offers the opportunity to work with state-of-the-art AI technologies, including LLMs, GNNs, and advanced LLM Agents. You'll be instrumental in developing containerized pipeline orchestration infrastructure and maintaining deep learning frameworks. The position combines technical leadership with hands-on development, requiring both architectural vision and practical implementation skills.
LinkedIn offers a collaborative environment where innovation is encouraged, and your work will directly impact millions of users worldwide. The company provides competitive compensation, comprehensive benefits, and the opportunity to work with leading experts in AI and distributed systems. This role is perfect for someone passionate about large-scale AI infrastructure who wants to shape the future of professional networking through advanced machine learning technologies.