Site Reliability Development at Google Cloud combines software and systems development to build and run large-scale, massively distributed, fault-tolerant systems. As a Site Reliability Developer, you'll be responsible for ensuring Google Cloud's services maintain reliability and uptime appropriate to customer needs while driving continuous improvement. The role involves managing complex challenges of scale unique to Google Cloud, utilizing expertise in coding, algorithms, complexity analysis, and large-scale system design.
The team culture emphasizes diversity, intellectual curiosity, problem-solving, and openness. Google brings together people with diverse backgrounds and perspectives, encouraging collaboration and risk-taking in a blame-free environment. The organization promotes self-direction on meaningful projects while providing support and mentorship for learning and growth.
You'll be working on optimizing existing systems, building infrastructure, and automating processes. The role requires technical expertise to manage project priorities, deadlines, and deliverables, as well as designing, developing, testing, deploying, maintaining, and enhancing software solutions. You'll be part of a team that maintains an ever-watchful eye on systems capacity and performance, ensuring both internally critical and externally-visible systems operate efficiently.
Google offers an inclusive work environment and is committed to equal opportunity employment, regardless of background. The company provides comprehensive benefits and supports work-life balance, making it an attractive destination for talented engineers looking to make an impact on large-scale systems.