NVIDIA is seeking a Senior Production SRE Engineer focused on Storage systems to join their team. This role combines software engineering practices with systems operations to design, build, and maintain large-scale production systems. The position requires expertise in storage, data management, and cloud services, with a focus on ensuring high reliability and uptime for NVIDIA's GPU cloud services.
The role involves working with cutting-edge AI/ML workloads and managing large-scale storage clusters. You'll be part of a diverse and collaborative team that values intellectual curiosity and problem-solving. The position offers opportunities to work with advanced technologies while maintaining and scaling critical infrastructure.
Key responsibilities include designing and implementing storage solutions, working with AI/ML workloads, and ensuring system reliability through monitoring and automation. You'll collaborate with various teams, participate in on-call rotations, and contribute to system design and improvement initiatives.
The ideal candidate should have strong experience with Linux systems, programming languages like Python or Go, and infrastructure management tools. Knowledge of Kubernetes, containers, and observability tools is highly valued. NVIDIA offers competitive compensation, including a base salary range of $148,000 - $339,250, plus equity and benefits.
This role is perfect for someone who enjoys tackling complex technical challenges, has a strong SRE mindset, and wants to work at the intersection of storage systems and AI technology. Join NVIDIA to be part of a team that's driving innovation in accelerated computing and transforming major industries through AI and digital twins.