Adobe is seeking a Site Reliability Engineer for their AI Training and Inference Platforms within Adobe Firefly. This role is part of a team of SREs working closely with Engineering teams on building, scaling, and securing the AI Platform. The platform enables Firefly product teams to manage and deploy Machine Learning capabilities used by Adobe client applications.
The platform will support thousands of models from Applied Research groups and App Teams across various lifecycle stages, offering ML model training and serving at scale with high-cost efficiency across multiple cloud platforms. The role combines traditional SRE responsibilities with specialized focus on AI/ML infrastructure.
As an SRE, you'll be responsible for ensuring platform reliability, implementing scalability solutions, and maintaining high uptime for Adobe's customers. You'll work with cutting-edge technologies in containerization, orchestration, and AI/ML frameworks while collaborating with various Adobe teams and cloud service providers.
The ideal candidate brings strong technical expertise in distributed systems, containerization, and infrastructure automation, combined with an understanding of AI/ML technologies. This role offers the opportunity to work on innovative AI platforms while solving complex technical challenges at scale.
The position offers competitive compensation ranging from $133,900 to $242,000 annually, based on location and experience. Join Adobe's team to help shape the future of AI infrastructure and contribute to groundbreaking technologies like Adobe Firefly.