Google Cloud is seeking a Staff Software Engineer to join their Cloud ML Compute Services team, focusing on building and supporting the Google Cloud Platform (GCP) Cloud TPU and GPU services. This role is crucial in developing next-generation technologies that impact billions of users' interactions with information and each other. The position involves working with cutting-edge AI infrastructure, specifically on the team responsible for ML frameworks, tools, models, and processes to achieve scale and performance in ML workloads in Google Cloud.
The ideal candidate will have extensive experience in software development, machine learning, and technical leadership. You'll be working on critical projects for Google Cloud's needs, with opportunities to switch teams and projects as the business evolves. The role requires expertise in high-performance computing, particularly with TPUs and GPUs, and deep knowledge of machine learning frameworks like PyTorch and JAX.
As a Staff Software Engineer, you'll be at the forefront of AI infrastructure development, working with both high-level Python and low-level C++ implementations. You'll collaborate with various teams to improve LLM training and inference performance, develop new features, and work directly with Cloud TPU power users to solve complex technical challenges.
The position offers competitive compensation, including a base salary range of $189,000-$284,000, plus bonus, equity, and comprehensive benefits. This is an excellent opportunity for someone passionate about machine learning infrastructure who wants to make a significant impact on how organizations leverage Google's cutting-edge technology for their ML workloads.