Staff Engineer, Site Reliability

LinkedIn is the world's largest professional network, built to help members of all backgrounds and experiences achieve more in their careers.
$147,000 - $240,000
Site Reliability
Staff Software Engineer
Hybrid
5+ years of experience
Enterprise SaaS

Description For Staff Engineer, Site Reliability

LinkedIn is the world's largest professional network, built to help members of all backgrounds and experiences achieve more in their careers. Our vision is to create economic opportunity for every member of the global workforce. Every day our members use our products to make connections, discover opportunities, build skills and gain insights. We believe amazing things happen when we work together in an environment where everyone feels a true sense of belonging, and that what matters most in a candidate is having the skills needed to succeed. It inspires us to invest in our talent and support career growth. Join us to challenge yourself with work that matters.

About Traffic SRE Traffic is responsible for delivering LinkedIn products and services to everyone on the Internet. Our team operates the edge of LinkedIn's data centers with a massive infrastructure that serves over 1 Billion members and Millions requests per second. We develop and manage Layer 4 and Layer 7 network proxies, load balancers, service discovery, monitoring, CI/CD pipelines. We're looking for engineers who want to solve challenging problems at the pace of LinkedIn's accelerating growth. It's highly-visible work that impacts our site every day, and we try to do it in a way that makes our lives easier. Come join us in this mission supporting LinkedIn's reach to every member of the global workforce.

At LinkedIn, we trust each other to do our best work where it works best for us and our teams. This role offers a hybrid work option, meaning you can both work from home and commute to a LinkedIn office, depending on what's best for you and when it is important for your team to be together. This role will be based in Sunnyvale, CA

Responsibilities: ● Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our Internet-facing services ● Gain deep knowledge of our complex applications. ● Develop and assist in the roll-out and deployment of new product features and installations to facilitate our rapid iteration and constant growth. ● Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale linux environment. ● Work closely with partner teams to ensure that platforms are designed with "operability" in mind. ● Function well in a fast-paced, rapidly-changing environment. ● Participate in a 24x7 rotation for second-tier escalations.

Basic Qualifications: ● B.S. or higher in Computer Science or other technical discipline, or related practical experience. ● 2+ years experience with operating and troubleshooting Linux at scale. ● Programming skills (Go, C/C++, Python, Rust, Java)

Preferred Qualifications: ● 5+ years in a UNIX-based large-scale web operations role. Experience with Kubernetes, Azure, On-prem. ● Experience with reverse proxies / load balancers (ATS, HAProxy, IPVS, etc..), TCP/IP networking. ● Experience with C/C++ / Go / Python server development, performance, and troubleshooting. ● Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a globally diverse, team-focused environment with other SREs, SWEs, Product Managers, etc. ● Knowledge of most of these: data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, and related topics

Suggested Skills:

  1. Go, C/C++, Python, Rust, Java development and troubleshooting
  2. Linux systems and TCP/IP network troubleshooting
  3. High availability web services
  4. Edge/Traffic networking
Last updated 3 months ago

Responsibilities For Staff Engineer, Site Reliability

  • Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our Internet-facing services
  • Gain deep knowledge of our complex applications
  • Develop and assist in the roll-out and deployment of new product features and installations
  • Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale linux environment
  • Work closely with partner teams to ensure that platforms are designed with 'operability' in mind
  • Function well in a fast-paced, rapidly-changing environment
  • Participate in a 24x7 rotation for second-tier escalations

Requirements For Staff Engineer, Site Reliability

Go
Java
Kubernetes
Linux
Python
  • B.S. or higher in Computer Science or other technical discipline, or related practical experience
  • 2+ years experience with operating and troubleshooting Linux at scale
  • Programming skills (Go, C/C++, Python, Rust, Java)
  • 5+ years in a UNIX-based large-scale web operations role
  • Experience with Kubernetes, Azure, On-prem
  • Experience with reverse proxies / load balancers (ATS, HAProxy, IPVS, etc..), TCP/IP networking
  • Experience with C/C++ / Go / Python server development, performance, and troubleshooting
  • Strong interpersonal communication skills
  • Ability to work well in a globally diverse, team-focused environment
  • Knowledge of data structures, relational and non-relational databases, networking, Linux internals, filesystems, web architecture, and related topics

Benefits For Staff Engineer, Site Reliability

  • Hybrid work option

Interested in this job?

Jobs Related To LinkedIn Staff Engineer, Site Reliability

Staff Engineer, Site Reliability

Staff Site Reliability Engineer role at LinkedIn focusing on operating and scaling Internet-facing services.

Staff Engineer, Site Reliability

Join LinkedIn as a Staff Engineer in Site Reliability, managing large-scale infrastructure and improving service delivery for over 1 Billion members.

Technical Program Manager, Site Reliability Engineering

Technical Program Manager position at Google leading SRE initiatives, requiring 5+ years of program management experience and strong technical expertise.

Software Engineering Manager II, Site Reliability Engineering

Lead Google's Site Reliability Engineering team in building and maintaining large-scale distributed systems, managing technical projects, and ensuring service reliability.

Software Engineering Manager II, Site Reliability Engineering, Google Cloud

Lead Site Reliability Engineering team at Google Cloud, managing distributed systems and ensuring service reliability at global scale.