Senior Compute SRE (GPU) - Apple Services Engineering

Apple is a technology company that designs, develops, and sells consumer electronics, computer software, and online services.
$166,600 - $296,300
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI · Enterprise SaaS

Description For Senior Compute SRE (GPU) - Apple Services Engineering

Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join the Apple Services Engineering team as a Site Reliability Engineer to help support and scale cloud services for thousands of development and operations engineers. This is a hands-on role to maintain and improve SRE practices for a private cloud service to accelerate our ability to reliably and consistently deliver thousands of applications.

As a Sr. Site Reliability Engineer, you will be responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish. You will design and deploy GPU-accelerated VM and container infrastructure, implement GPU-based Kubernetes clusters, work with stakeholders to understand requirements, implement best practices for security and scalability, monitor and optimize resource utilization, participate in capacity planning and disaster recovery exercises, troubleshoot issues across the entire infrastructure stack, and maintain relationships with vendors.

Key responsibilities include:

  • Designing and deploying GPU-accelerated VM and container infrastructure
  • Implementing GPU-based Kubernetes clusters
  • Working with data scientists and developers to provide solutions for GPU-accelerated tasks
  • Implementing best practices for security, scalability, and high availability
  • Monitoring and optimizing resource utilization
  • Participating in capacity planning and disaster recovery exercises
  • Troubleshooting issues across the entire infrastructure stack
  • Cultivating relationships with internal and external vendors

The ideal candidate will be highly self-motivated with a passion for excellence, quality, and detail. You will not only support operations but also work closely with developers and architects to improve stability, security, and scalability.

Last updated 2 months ago

Responsibilities For Senior Compute SRE (GPU) - Apple Services Engineering

  • Design and deploy GPU-accelerated VM and container infrastructure
  • Implement GPU-based Kubernetes clusters to support containerized applications and services
  • Work with data scientists, developers, and other stakeholders to provide solutions for GPU-accelerated tasks
  • Implement best practices for security, scalability, and high availability environments
  • Monitor and optimize resource utilization to ensure performance and cost-efficiency
  • Participate in capacity planning, scale testing, and disaster recovery exercises
  • Troubleshoot issues across the entire infrastructure stack
  • Cultivate and maintain relationships with internal and external third-party vendors

Requirements For Senior Compute SRE (GPU) - Apple Services Engineering

Kubernetes
Linux
Go
  • 5+ years in a Site Reliability Engineering, DevOps, or Infrastructure focused role
  • Proven experience with GPU-based virtual machine infrastructure and cloud platforms (e.g., AWS, GCP)
  • Experience with GPU hardware (e.g., NVIDIA, AMD) and associated software stack (e.g., CUDA, cuDNN)
  • Experience with GitOps, CI/CD tools, and deployment strategies like Spinnaker, Argo
  • Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and Prometheus
  • Outstanding organizational and communications skills

Benefits For Senior Compute SRE (GPU) - Apple Services Engineering

Medical Insurance
Dental Insurance
401k
Education Budget
Equity
  • Comprehensive medical and dental coverage
  • Retirement benefits
  • Discounted products and free services
  • Tuition reimbursement for job-related education
  • Discretionary restricted stock unit awards
  • Employee Stock Purchase Plan
  • Potential for discretionary bonuses or commission payments
  • Potential relocation assistance

Interested in this job?

Jobs Related To Apple Senior Compute SRE (GPU) - Apple Services Engineering

Site Reliability Engineer (SRE) - Object Storage

Senior SRE position at Apple focusing on distributed storage systems, offering competitive compensation and the opportunity to impact millions of users.

Senior Service Reliability Engineer - Apple Data Platform

Senior SRE position at Apple Services Engineering team, focusing on maintaining and scaling cloud infrastructure for Apple's digital services using Kubernetes, AWS, and GCP.

Staff Software Engineer, Reliability Engineering

Staff Software Engineer position at Airbnb focusing on Site Reliability Engineering, developing and maintaining tools for service reliability at scale.

Sr Staff Software Engineer, Reliability Engineering

Senior Staff SRE position at Airbnb focusing on reliability architecture, incident management, and technical leadership, offering competitive compensation and remote work flexibility.

Technical Program Manager III, Site Reliability, Storage

Technical Program Manager III position at Google, leading Storage Site Reliability Engineering initiatives and cross-functional programs.