Senior Compute SRE (GPU) - Apple Services Engineering

Apple

Apple is a technology company that designs, develops, and sells consumer electronics, computer software, and online services.

Seattle, WA, USA

$166,600 - $296,300

Site Reliability

Staff Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Description For Senior Compute SRE (GPU) - Apple Services Engineering

Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join the Apple Services Engineering team as a Site Reliability Engineer to help support and scale cloud services for thousands of development and operations engineers. This is a hands-on role to maintain and improve SRE practices for a private cloud service to accelerate our ability to reliably and consistently deliver thousands of applications.

As a Sr. Site Reliability Engineer, you will be responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish. You will design and deploy GPU-accelerated VM and container infrastructure, implement GPU-based Kubernetes clusters, work with stakeholders to understand requirements, implement best practices for security and scalability, monitor and optimize resource utilization, participate in capacity planning and disaster recovery exercises, troubleshoot issues across the entire infrastructure stack, and maintain relationships with vendors.

Key responsibilities include:

Designing and deploying GPU-accelerated VM and container infrastructure
Implementing GPU-based Kubernetes clusters
Working with data scientists and developers to provide solutions for GPU-accelerated tasks
Implementing best practices for security, scalability, and high availability
Monitoring and optimizing resource utilization
Participating in capacity planning and disaster recovery exercises
Troubleshooting issues across the entire infrastructure stack
Cultivating relationships with internal and external vendors

The ideal candidate will be highly self-motivated with a passion for excellence, quality, and detail. You will not only support operations but also work closely with developers and architects to improve stability, security, and scalability.

Last updated 2 months ago

Responsibilities For Senior Compute SRE (GPU) - Apple Services Engineering

Design and deploy GPU-accelerated VM and container infrastructure
Implement GPU-based Kubernetes clusters to support containerized applications and services
Work with data scientists, developers, and other stakeholders to provide solutions for GPU-accelerated tasks
Implement best practices for security, scalability, and high availability environments
Monitor and optimize resource utilization to ensure performance and cost-efficiency
Participate in capacity planning, scale testing, and disaster recovery exercises
Troubleshoot issues across the entire infrastructure stack
Cultivate and maintain relationships with internal and external third-party vendors

Requirements For Senior Compute SRE (GPU) - Apple Services Engineering

Kubernetes

Linux

5+ years in a Site Reliability Engineering, DevOps, or Infrastructure focused role
Proven experience with GPU-based virtual machine infrastructure and cloud platforms (e.g., AWS, GCP)
Experience with GPU hardware (e.g., NVIDIA, AMD) and associated software stack (e.g., CUDA, cuDNN)
Experience with GitOps, CI/CD tools, and deployment strategies like Spinnaker, Argo
Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and Prometheus
Outstanding organizational and communications skills

Benefits For Senior Compute SRE (GPU) - Apple Services Engineering

Medical Insurance

Dental Insurance

401k

Education Budget

Equity

Comprehensive medical and dental coverage
Retirement benefits
Discounted products and free services
Tuition reimbursement for job-related education
Discretionary restricted stock unit awards
Employee Stock Purchase Plan
Potential for discretionary bonuses or commission payments
Potential relocation assistance

Apple

Apple is a technology company that designs, develops, and sells consumer electronics, computer software, and online services.

Seattle, WA, USA

$166,600 - $296,300

Site Reliability

Staff Software Engineer

In-Person

5,000+ Employees

5+ years of experience

AI · Enterprise SaaS

Interested in this job?

Jobs Related To Apple Senior Compute SRE (GPU) - Apple Services Engineering

Site Reliability Engineer (SRE) - Object Storage

Apple

Senior SRE position at Apple focusing on distributed storage systems, offering competitive compensation and the opportunity to impact millions of users.

Senior Service Reliability Engineer - Apple Data Platform

Apple

Senior SRE position at Apple Services Engineering team, focusing on maintaining and scaling cloud infrastructure for Apple's digital services using Kubernetes, AWS, and GCP.

Staff Software Engineer, Reliability Engineering

Airbnb

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Sr Staff Software Engineer, Reliability Engineering

Airbnb

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Technical Program Manager III, Site Reliability, Storage

Google

Technical Program Manager III position at Google, leading Storage Site Reliability Engineering initiatives and cross-functional programs.