Site Reliability Engineer, AI/ML Platforms

Adobe is a global technology company that provides digital media and digital marketing software solutions.
$133,900 - $242,000
Site Reliability
Senior Software Engineer
In-Person
5+ years of experience
AI

Description For Site Reliability Engineer, AI/ML Platforms

Adobe is seeking a Site Reliability Engineer for their AI Training and Inference Platforms within Adobe Firefly. This role is part of a team of SREs working closely with Engineering teams on building, scaling, and securing the AI Platform. The platform enables Firefly product teams to manage and deploy Machine Learning capabilities used by Adobe client applications.

The platform will support thousands of models from Applied Research groups and App Teams across various lifecycle stages, offering ML model training and serving at scale with high-cost efficiency across multiple cloud platforms. The role combines traditional SRE responsibilities with specialized focus on AI/ML infrastructure.

As an SRE, you'll be responsible for ensuring platform reliability, implementing scalability solutions, and maintaining high uptime for Adobe's customers. You'll work with cutting-edge technologies in containerization, orchestration, and AI/ML frameworks while collaborating with various Adobe teams and cloud service providers.

The ideal candidate brings strong technical expertise in distributed systems, containerization, and infrastructure automation, combined with an understanding of AI/ML technologies. This role offers the opportunity to work on innovative AI platforms while solving complex technical challenges at scale.

The position offers competitive compensation ranging from $133,900 to $242,000 annually, based on location and experience. Join Adobe's team to help shape the future of AI infrastructure and contribute to groundbreaking technologies like Adobe Firefly.

Last updated 2 days ago

Responsibilities For Site Reliability Engineer, AI/ML Platforms

  • Identify and implement methodologies and solutions to increase reliability, scalability, security, and efficiency
  • Ensure the highest uptime and Quality of Service (QoS) for Adobe's customers through operational excellence
  • Define service level objectives (SLOs) and indicators (SLIs) to represent and measure service quality
  • Support and maintain globally distributed, multi-cloud environments
  • Automate common, repeatable tasks at a large scale to streamline operational procedures
  • Identify areas to improve service resiliency through techniques such as chaos engineering, performance/load testing
  • Coordinate with other Adobe platform teams and service providers (primarily AWS) to innovate on Generative AI as a Service

Requirements For Site Reliability Engineer, AI/ML Platforms

Python
Go
Kubernetes
  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field
  • 5+ years relevant industry experience
  • Excellence in undefined environments and excitement about finding pragmatic solutions
  • Keep up with industry trends and grow knowledge and skills
  • Experience in building and scaling distributed systems
  • Production level expertise with containerization orchestration engines
  • Fundamental programming skills in Python, Go
  • Good knowledge of infrastructure configuration management tools like Ansible and Terraform
  • Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic Stack
  • Understanding of AI/ML, including ML frameworks, public cloud, and commercial AI/ML solutions

Interested in this job?

Jobs Related To Adobe Site Reliability Engineer, AI/ML Platforms

Sr. Site Reliability Engineer - Adobe Experience Platform

Senior Site Reliability Engineer position at Adobe Experience Platform, focusing on maintaining and scaling core messaging services while working with cutting-edge technologies.

Software Development Engineer

Senior Site Reliability Engineer role at Adobe focused on building and maintaining highly reliable cloud infrastructure and systems.

Senior Site Reliability Engineer

Senior Site Reliability Engineer role at Adobe's Frame.io, focusing on application reliability, infrastructure automation, and developer experience enhancement.

Sr Site Reliability Engineer, AI Platform Inference

Senior Site Reliability Engineer position at Adobe, focusing on AI Platform Inference infrastructure, offering competitive salary and opportunity to work with cutting-edge AI technology.

Site Reliability Engineer

Senior Site Reliability Engineer role at Adobe, focusing on cloud services optimization and automation, offering competitive compensation of $133,900-$242,000 in San Jose.