Senior Software Engineer, TPU Supercomputer

Google is a global technology company that builds infrastructure and platforms powering their extensive product portfolio.
Distributed Systems
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
AI

Description For Senior Software Engineer, TPU Supercomputer

Google is seeking a Senior Software Engineer to join their Technical Infrastructure team, specifically focusing on TPU Supercomputer systems. This role is at the heart of Google's AI infrastructure, working on cutting-edge technology that powers Google's machine learning capabilities. The position involves designing and maintaining sophisticated software systems for TPU supercomputers, requiring expertise in C++ and distributed systems.

The role offers an exciting opportunity to work with state-of-the-art AI infrastructure, managing both computing and networking components of Google's AI supercomputers. You'll be responsible for creating system-level tools and ensuring the reliability of the entire TPU stack. This position requires collaboration with various teams, including Silicon, Software, SRE, and Operations, making it perfect for someone who enjoys cross-functional teamwork.

The ideal candidate will bring at least 5 years of C++ development experience and 3 years of distributed systems expertise. Knowledge of cloud platforms, machine learning frameworks, and networking protocols is highly valued. This role offers the chance to work on technology that directly impacts Google's AI capabilities and future innovations.

Working at Google means joining a company that values diversity, inclusion, and innovation. The Technical Infrastructure team takes pride in being the "engineers' engineers," solving complex problems that make Google's vast product portfolio possible. This role provides an opportunity to work on cutting-edge technology while collaborating with some of the brightest minds in the industry.

Last updated 4 days ago

Responsibilities For Senior Software Engineer, TPU Supercomputer

  • Design and maintain TPU supercomputer software across different layers of the software stack to control, monitor, build, deploy, qualify and serve the TPU supercomputing systems
  • Manage the whole lifecycle across both computing and networking components for Google's AI supercomputer/hypercomputers
  • Create system-level debuggability and observability tools in partnership with key stakeholders, ensuring alignment and functionality across the stack
  • Collaborate with Silicon, Software, Site Reliability Engineer (SRE), and Operations teams to drive reliability improvements and address issues across the entire TPU stack

Requirements For Senior Software Engineer, TPU Supercomputer

  • Bachelor's degree or equivalent practical experience
  • 5 years of experience in system software development with C++
  • 3 years of experience in distributed systems
  • Master's degree or PhD in Computer Science or related technical field (preferred)
  • Experience with production monitoring, logging, and observability tools (preferred)
  • Experience with cloud platforms and technologies (e.g. GCP) (preferred)
  • Experience with machine learning concepts and frameworks (e.g., TensorFlow) (preferred)
  • Experience with data analysis and SQL (preferred)
  • Knowledge of networking protocols and technologies (preferred)

Interested in this job?

Jobs Related To Google Senior Software Engineer, TPU Supercomputer

Senior Software Engineer, D-SDN, Google Global Networking

Senior Software Engineer position at Google working on distributed networking applications for Google's Global Networking infrastructure.

Senior Software Engineer, Infrastructure, Google Cloud NetInfra

Senior Software Engineer position at Google Cloud NetInfra, focusing on infrastructure and distributed systems development with competitive compensation and benefits.

Senior Software Engineer, Infrastructure, Google Cloud Unified Fulfillment Optimization

Senior Software Engineer position at Google Cloud focusing on infrastructure development and distributed systems, offering competitive compensation and opportunity to work on large-scale technical challenges.

Senior Software Engineer, Infrastructure, Platforms Infrastructure Engineering

Senior Software Engineer position at Google focusing on infrastructure and platforms engineering, offering competitive compensation and opportunity to work on large-scale systems.

Senior Software Engineer, D-SDN, Google Global Networking

Senior Software Engineer position at Google focusing on distributed networking applications and dSDN technologies, requiring 5 years of experience and strong software development skills.