Site Reliability Engineer

Pure Storage redefines the storage experience and empowers innovators by simplifying how people consume and interact with data.
Site Reliability
Mid-Level Software Engineer
In-Person
5+ years of experience
Enterprise SaaS

Description For Site Reliability Engineer

Pure Storage is seeking a Site Reliability Engineer to join their Infrastructure Shared Service (ISS) team in Bengaluru, India. As an SRE, you'll work on improving the reliability and performance of Pure Storage's critical infrastructure applications. You'll be responsible for setting and owning SLO goals for uptime and latency, as well as helping colleagues leverage available features and workflows. The role involves working with backend web servers, load balancers, and database servers to ensure they run smoothly.

Key responsibilities include:

  1. Engaging in the entire lifecycle of services from design to operation
  2. Designing, operating, and troubleshooting enterprise systems
  3. Establishing sustainable incident response and blameless postmortems
  4. Supporting services pre-launch through system design and capacity planning
  5. Scaling systems through automation and scripting
  6. Collaborating with development teams and stakeholders across time zones
  7. Ensuring hardware design meets business and technical requirements
  8. Maintaining documentation on system configurations and procedures
  9. Performing day-to-day server, storage, and network administration
  10. Deploying infrastructure manually and via automation platforms
  11. Troubleshooting and resolving hardware, software, and network issues

The ideal candidate should have:

  • 5+ years of experience as an SRE, DevOps Engineer, or Infrastructure Engineer
  • Strong programming skills in Python or other languages
  • Experience with distributed systems, Linux environments, and VMware
  • Familiarity with observability platforms like Elastic or DataDog
  • Knowledge of Infrastructure as Code tools (Ansible, Terraform)
  • Experience with containerization and cloud environments (AWS & Azure)

This role offers the opportunity to work on cutting-edge technology in a fast-paced environment, contributing to the success of a company that's revolutionizing data storage and management. Join Pure Storage to be part of building the future of data infrastructure.

Last updated 4 months ago

Responsibilities For Site Reliability Engineer

  • Engage in and improve the whole lifecycle of services—from inception and design, through deployment and operation
  • Design, operate, maintain, and troubleshoot enterprise systems such as databases, message queues, APIs, and distributed applications
  • Establish and practice sustainable incident response and blameless postmortems to prevent problem recurrence
  • Support services before they go live through activities such as system design, developing software platforms and frameworks, capacity planning, and launch reviews
  • Scale systems sustainably through mechanisms like scripting and automation
  • Work closely with development teams, infrastructure teams, and business stakeholders across multiple time zones
  • Ensure that hardware design meets business and technical requirements, including performance, scalability, and reliability
  • Create and maintain detailed documentation on system configurations, procedures, and operational policies
  • Day to day server administration (physical, virtual), storage administration, network config and applications support
  • Deploy infrastructure manually and also via configuration management / automation platforms
  • Troubleshoot hardware, software, and network related issues, provide quick resolution and perform root cause analysis

Requirements For Site Reliability Engineer

Python
Linux
Kubernetes
  • Experience programming in Python or other languages
  • Experience in designing, analysing, and troubleshooting large-scale distributed systems
  • Able to work in a 24x7 on-call rotation (approx. 1 week every 2 months)
  • Systematic problem-solving approach, strong communication skills, and a sense of ownership and drive
  • Working experience of Observability platforms such as Elastic or DataDog
  • Experience deploying / troubleshooting Linux systems (Red Hat/CentOS), Ubuntu as well as VMware environments (esxi, NSX, vsan)
  • Experience working directly with end users to determine deployment and configuration requirements
  • Ability to lift 15+ kilograms when working with storage equipment

Benefits For Site Reliability Engineer

  • Flexible time off
  • Wellness resources
  • Company-sponsored team events

Interested in this job?

Jobs Related To Pure Storage Site Reliability Engineer

Cloud Site Reliability Engineer (SRE)

Cloud SRE position at Incorta focusing on infrastructure reliability, automation, and DevOps practices, requiring 2-3 years of experience.

Site Reliability Engineer

Site Reliability Engineer position focused on managing and supporting cloud applications and infrastructure using AWS and Atlassian tools.

Software Engineer, Traffic Trust SRE, DoS Infrastructure

Site Reliability Engineer position at Google focusing on Traffic Trust and DoS Infrastructure, combining software engineering with systems operations to maintain large-scale distributed systems.

Software Engineer III, Site Reliability Engineer

Site Reliability Engineer role at Google focusing on building and maintaining large-scale distributed systems for Google Cloud services.

Databases Site Reliability Engineer

Site Reliability Engineer position at Google focusing on database systems, requiring expertise in distributed systems and infrastructure management.