LinkedIn is seeking a Sr. Software Engineer for Site Reliability to join their Streaming SRE team. This role combines development and operational responsibilities, ensuring the reliability of centralized Pubsub systems at LinkedIn. The team is responsible for maintaining one of the largest Streaming ecosystems on the planet, including Kafka, which processes over 50 trillion messages per day across more than 150 clusters.
Key Responsibilities:
- Serve as a primary point responsible for the overall health, performance, and capacity of Internet-facing services
- Gain deep knowledge of complex applications
- Assist in rolling out and ramping up new product features and technologies
- Develop tools to improve deployment and monitoring of custom applications in a large-scale Linux environment
- Function well in a fast-paced, rapidly-changing environment
- Collaborate with diverse teams including Developers, SREs, Network Engineers, and Product Managers
- Participate in a 24x7 rotation for second-tier escalations
Required Qualifications:
- B.S. or higher in Computer Science or related technical discipline, or equivalent practical experience
- 2+ years of experience with administration and troubleshooting of Unix/Linux systems
- Programming skills in one or more of Java, Rust, Go, Python, Ruby, C++
Preferred Qualifications:
- 5+ years in a UNIX-based large-scale web operations role
- Experience with Java or C++ development
- Experience in building systems automation and server backends with Python
- Experience with Linux OS (Azure Linux, Redhat)
- Ability to troubleshoot production systems at scale
The role offers a hybrid work option, allowing flexibility to work from home and commute to the LinkedIn office as needed. This position will be based in Mountain View, CA.
LinkedIn is committed to fair and equitable compensation practices, with a pay range of $121,000 to $198,000 for this role. The actual compensation package will be based on various factors including skill set, experience, and work location.