Sr. Staff Software Engineer, AI Training Platform

LinkedIn is the world's largest professional network, built to help members of all backgrounds and experiences achieve more in their careers.
$180,000 - $300,000
Machine Learning
Staff Software Engineer
Hybrid
1,000 - 5,000 Employees
10+ years of experience
AI

Description For Sr. Staff Software Engineer, AI Training Platform

LinkedIn is the world's largest professional network, built to help members of all backgrounds and experiences achieve more in their careers. Our vision is to create economic opportunity for every member of the global workforce. Every day our members use our products to make connections, discover opportunities, build skills and gain insights. We believe amazing things happen when we work together in an environment where everyone feels a true sense of belonging, and that what matters most in a candidate is having the skills needed to succeed. It inspires us to invest in our talent and support career growth. Join us to challenge yourself with work that matters.

This role will be based in Mountain View, CA, San Francisco, CA or Bellevue, WA. At LinkedIn, we trust each other to do our best work where it works best for us and our teams. This role offers hybrid work options, meaning you can work from home and commute to a LinkedIn office, depending on what's best for you and when your team needs to be together.

As part of LinkedIn's AI Platform group, the AI Training team is responsible for developing and maintaining highly available and scalable deep learning training solutions to power our rapidly growing AI use cases. The team is responsible for scaling LinkedIn's AI model training with hundreds of billions of parameters for all AI use cases from recommendation models, large language models (Generative AI), to computer vision models. We optimize training performance across algorithms, AI frameworks, infrastructure software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow, Horovod, Ray, Hadoop, etc.) in the team. Additionally, this team focussed on technologies like LLMs, GNNs, Incremental Learning, Online Learning, and advanced LLM Agents work for Training infrastructure.

As a Senior Staff Software Engineer on the AI Training Infra team, you will play a crucial role in leading and building the next-gen training infrastructure to power AI use cases. You will design and implement high performance AI Training pipeline, data I/O, work with open source teams to identify and resolve issues in popular libraries like Huggingface, Horovod and PyTorch, debug and optimize deep learning training, and provide advanced support for internal AI teams in areas like model parallelism, data parallelism, Zero, automatic mixed precision and kernel fusion. Finally, you will assist in and guide the development of containerized pipeline orchestration infrastructure, including developing and distributing stable base container images, providing advanced profiling and observability, and updating internally maintained versions of deep learning frameworks and their companion libraries like Tensorflow, PyTorch, DeepSpeed, GNNs, Flash Attention and more.

Last updated a month ago

Responsibilities For Sr. Staff Software Engineer, AI Training Platform

  • Owning the technical strategy for broad or complex requirements with insightful and forward-looking approaches that go beyond the direct team and solve large open-ended problems.
  • Designing, implementing, and optimizing the performance of large-scale distributed training for personalized recommendation as well as large language models.
  • Improving the observability and understandability of various systems with a focus on improving developer productivity and system sustenance.
  • Mentoring other engineers, defining our challenging technical culture, and helping to build a fast-growing team.
  • Working closely with the open-source community to participate and influence cutting edge open-source projects (e.g., PyTorch, GNNs, DeepSpeed, Huggingface, etc.).
  • Functioning as the tech-lead for several concurrent key initiatives for the Training Infrastructure and defining the future of AI training platforms.

Requirements For Sr. Staff Software Engineer, AI Training Platform

Python
Java
Go
Rust
Scala
  • BS/BA in Computer Science or related technical field or equivalent technical experience
  • 5+ years of industry experience in software design, development, and algorithm related solutions
  • 5+ years of experience programming in object-oriented languages such as Python, C++, Java, Go, Rust, Scala
  • 2+ years of experience as an architect, or technical leadership position
  • 5+ years of experience in the industry with leading / building deep learning systems
  • Hands-on experience developing distributed systems or other large-scale systems

Benefits For Sr. Staff Software Engineer, AI Training Platform

Medical Insurance
Vision Insurance
Dental Insurance
401k
Parental Leave
Commuter Benefits
  • Medical insurance
  • Vision insurance
  • Dental insurance
  • 401(k)
  • Paid maternity leave
  • Child care support
  • Paid paternity leave
  • Commuter benefits
  • Student loan assistance
  • Tuition assistance
  • Disability insurance

Interested in this job?

Jobs Related To LinkedIn Sr. Staff Software Engineer, AI Training Platform

Sr. Product Manager: Discovery Intelligence

Lead App Store's AI-driven discovery features as Sr. Product Manager at Apple, shaping user experiences across all platforms.

Lead Software Engineer

Lead Software Engineer position at Salesforce, focusing on machine learning and data science products.

Lead Software Engineer, Machine Learning - Ad Platforms

Lead Software Engineer position for Machine Learning in Ad Platforms at Disney, focusing on prediction and optimization engines for addressable ad platforms.

Staff Machine Learning Engineer, Marketing Technology

Staff Machine Learning Engineer role at Airbnb, focusing on AI/ML for personalized marketing technology.

Staff Machine Learning Engineer, Price Modeling

Staff Machine Learning Engineer role at Airbnb, focusing on price modeling using reinforcement learning techniques.