NVIDIA is seeking a Senior Site Reliability Engineer to join their cloud service team, focusing on supporting and building generative AI-powered visual applications. This role combines the excitement of working with cutting-edge AI technology and the challenges of maintaining high-performance, globally distributed systems. You'll be responsible for managing infrastructure across 60+ edge locations and major cloud providers, ensuring optimal performance of AI workloads on NVIDIA's GPU architectures.
The position offers a unique opportunity to work at the intersection of AI and infrastructure, requiring both deep technical expertise and strategic thinking. You'll be implementing SRE practices crucial to product quality, including proactive outage prevention, blameless postmortems, and continuous service improvement. The role involves collaboration with various teams, from service owners to research groups, making it ideal for someone who enjoys both technical challenges and cross-functional teamwork.
As an NVIDIAN, you'll be part of a company that's been at the forefront of innovation for over 25 years, currently leading the charge in generative AI development. The role offers exposure to groundbreaking technologies and the chance to work with some of the industry's best talents in a diverse, encouraging environment. This position is perfect for someone who combines strong SRE fundamentals with an interest in AI technologies and a desire to shape the future of computing.
The ideal candidate will bring extensive experience in production environments, strong coding skills, and a deep understanding of cloud technologies. Knowledge of AI/ML technologies and experience with containerization for AI models would be particularly valuable. You'll be joining a company that's widely recognized as one of technology's most desirable employers, offering the opportunity to work on projects that are defining the next era of computing.