Hardware Reliability Engineer, Infrastructure Reliability & Quality

AWS is the world's most comprehensive and broadly adopted cloud platform, pioneering cloud computing and continuously innovating.
DevOps
Senior Software Engineer
In-Person
5,000+ Employees
5+ years of experience
Enterprise SaaS · Cloud

Description For Hardware Reliability Engineer, Infrastructure Reliability & Quality

AWS Infrastructure Services is seeking a Hardware Reliability Engineer to join their team responsible for keeping the cloud running smoothly. This role combines technical expertise with business acumen, focusing on maintaining and improving the reliability of AWS's global infrastructure.

As an Infrastructure Reliability Engineer, you'll be at the forefront of ensuring AWS datacenter infrastructure and security equipment operates at peak efficiency. You'll work with cutting-edge technology, analyzing and mitigating reliability risks for critical systems including cameras, media destruction devices, access control systems, and various power and cooling equipment.

The role requires a unique blend of technical knowledge and analytical skills. You'll use physics-based approaches to evaluate product reliability, conduct lifecycle environmental assessments, and develop system-level reliability models. Your work will directly impact AWS's ability to provide continuous, reliable service to customers worldwide.

You'll join a diverse team of professionals, including software engineers, hardware specialists, and security experts. The collaborative environment encourages knowledge sharing and professional growth, with access to mentorship and career development resources. AWS values work-life harmony and maintains an inclusive culture that welcomes diverse perspectives and bold ideas.

Key responsibilities include driving reliability risk identification, performing root cause analysis of critical failures, and working with both internal teams and external vendors to implement improvements. You'll need strong analytical skills, excellent communication abilities, and a proven track record in reliability engineering.

This is an excellent opportunity for someone who wants to impact cloud computing infrastructure at a global scale. You'll be part of AWS's mission to deliver the highest standards of safety and security while providing seemingly infinite capacity at the lowest possible cost for customers.

The ideal candidate will have at least 5 years of relevant experience and a strong educational background in reliability engineering or related fields. You'll need to be comfortable with both technical analysis and business negotiations, as you'll interface with various stakeholders to drive continuous improvement in datacenter availability.

Last updated 21 hours ago

Responsibilities For Hardware Reliability Engineer, Infrastructure Reliability & Quality

  • Drive reliability risk identification, assessment and mitigation for datacenter infrastructure & security equipment
  • Perform root cause analysis of critical equipment failures
  • Drive continuous improvements to improve datacenter availability & security
  • Work with internal and external partners including suppliers
  • Develop datacenter system level reliability model
  • Monitor product performance and drive corrective actions
  • Conduct vendor auditing and quarterly review process
  • Drive AWS application-specific requirements in lifecycle environmental and operational stress analysis

Requirements For Hardware Reliability Engineer, Infrastructure Reliability & Quality

Linux
Kubernetes
  • Bachelor's or Master's degree in Reliability Engineering, Physics, Electrical, Mechanical or Materials Engineering or related field
  • 5+ years of Reliability Engineering work experience in high reliability industry
  • 5+ years of experience with failure analysis activities and root cause analysis
  • 5+ years of experience with accelerated life testing, stress analysis and finite element analysis
  • Knowledge of statistical techniques and models
  • Ability to travel within US and internationally

Benefits For Hardware Reliability Engineer, Infrastructure Reliability & Quality

Medical Insurance
Dental Insurance
Vision Insurance
  • Work-life harmony
  • Career development opportunities
  • Mentorship programs
  • Inclusive culture
  • Employee-led affinity groups

Interested in this job?

Jobs Related To Amazon Hardware Reliability Engineer, Infrastructure Reliability & Quality

Control Systems Engineer, EMEA Controls

Senior Control Systems Engineer role at AWS, focusing on data center infrastructure and control systems, requiring 5+ years experience and extensive travel.

Sr. Systems Development Engineer – SRE, Kuiper

Senior SRE position at Amazon's Project Kuiper, working on satellite communications infrastructure and ASIC development.

System Development Engineer II, AWS, Network Alerts

Senior System Development Engineer role at AWS Network Alerts team, building and maintaining monitoring systems for one of the world's largest networks.

Senior Software Development Engineer III, Everglades

Senior DevOps role at AWS Everglades team building and supporting internal tools, requiring TS/SCI clearance and strong development/operations experience.

Systems Development Engineer II, Amazon Autos

Senior Systems Development Engineer role at Amazon Autos, focusing on AWS infrastructure, security, and automotive e-commerce solutions.