Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join the Apple Services Engineering team as a Site Reliability Engineer to help support and scale cloud services for thousands of development and operations engineers. This is a hands-on role to maintain and improve SRE practices for a private cloud service to accelerate our ability to reliably and consistently deliver thousands of applications.
As a Sr. Site Reliability Engineer, you will be responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish. You will design and deploy GPU-accelerated VM and container infrastructure, implement GPU-based Kubernetes clusters, work with stakeholders to understand requirements, implement best practices for security and scalability, monitor and optimize resource utilization, participate in capacity planning and disaster recovery exercises, troubleshoot issues across the entire infrastructure stack, and maintain relationships with vendors.
Key responsibilities include:
The ideal candidate will be highly self-motivated with a passion for excellence, quality, and detail. You will not only support operations but also work closely with developers and architects to improve stability, security, and scalability.