NVIDIA is seeking a Senior Manager for their Storage Production Engineering team to lead Site Reliability Engineering (SRE) initiatives. This role combines technical leadership with people management, focusing on designing and maintaining large-scale production systems with an emphasis on storage solutions.
The position requires a seasoned professional with 10+ years of experience, including 5+ years in management, who can bridge the gap between technical excellence and team leadership. You'll be responsible for overseeing critical storage infrastructure that supports NVIDIA's GPU cloud services, both internal and external, working with cutting-edge technologies including cloud-native storage solutions and Kubernetes.
As a leader in this role, you'll drive strategic initiatives to enhance storage system reliability and performance while managing a team of Storage SRE professionals. Your responsibilities span from technical architecture decisions to team development, incident response management, and implementing automation solutions for improved efficiency.
The role offers an exciting opportunity to work at the forefront of AI computing, as NVIDIA continues to revolutionize parallel computing and deep learning. You'll be part of a company that invented the GPU and is now leading the AI computing revolution. The position comes with competitive compensation ($272,000 - $425,500) plus equity and benefits.
This is an ideal role for someone who combines deep technical knowledge of storage systems with strong leadership capabilities, and who is passionate about building and maintaining robust, scalable infrastructure. You'll work in a collaborative environment that values innovation and technical excellence, with the opportunity to make a significant impact on systems that power some of the most advanced AI and ML solutions in the world.