System Development Manager, AWS Resilience, AWS Incident Response

Amazon is a global technology company that provides cloud computing, e-commerce, artificial intelligence, and digital streaming services.
Backend
Staff Software Engineer
In-Person
5+ years of experience
Enterprise SaaS
This job posting may no longer be active. You may be interested in these related jobs instead:
Software Development Manager, Amazon Health - New Partner Services

Lead software development team building healthcare technology solutions at Amazon, combining technical expertise with team management to create accessible healthcare products.

System Development Manager, FBA Capacity Management and Planning

Lead system development team for Amazon's FBA Capacity Management, overseeing large-scale distributed systems and automation initiatives.

Software Development Manager - Amazon Fulfillment Technologies, Labor Scheduling

Lead software development teams building large-scale workforce optimization systems for Amazon's global fulfillment operations.

Sr. Program Manager, Workflow Efficiency, Amazon Private Brand

Senior Program Manager role at Amazon Private Brands focusing on workflow efficiency and process optimization for global product development.

Sr. EU Site Process Engineer, EU PE

Senior EU Site Process Engineer position at Amazon, focusing on process improvement and operational excellence in fulfillment centers.

Description For System Development Manager, AWS Resilience, AWS Incident Response

AWS Resilience owns service to prevent and respond to availability and security issues for all AWS Services. As a System Development Manager on the AWS Incident Response team, you will manage automated tooling roadmaps and delivery for the detection and resolution of issues within AWS and Amazon infrastructure. You'll also direct the resolution of high visibility incidents, drive improvements in automation, tooling, and processes, and coordinate across project teams to expand the use of our tooling. Key responsibilities include defining and delivering business priorities, cross-site and cross-team coordination, incident/change management, and performance management/team health. This role offers great growth potential and an opportunity to make a huge impact on keeping the cloud running.

Last updated 3 months ago

Responsibilities For System Development Manager, AWS Resilience, AWS Incident Response

  • Define, plan, track and deliver strategic goals for the global AWS Incident Response team
  • Coordinate with counterparts to ensure clear communication between AWS Operations teams
  • Work with systems and product teams to create and maintain proper processes for monitoring and alarming on services
  • Manage inquiries regarding engagement processes and issues within the global Amazon platform
  • Drive initiatives to improve existing tools & processes
  • Provide feedback on new practices & procedures to scale with AWS Services expansion
  • Own all facets of performance and career management for the team

Requirements For System Development Manager, AWS Resilience, AWS Incident Response

  • 5+ years of direct experience with cloud hosting technologies (AWS, Azure, etc.)
  • 5+ years experience managing an engineering team operating at scale
  • Deep understanding of infrastructure delivered through the software development lifecycle in an API-enabled environment
  • Experience in implementing, supporting, and evaluating tools and services with a security, scalability, and performance mindset
  • Ability to handle multiple competing priorities in a fast-paced environment
  • Ability to interact with and influence people at all levels
  • Excellent written and verbal communication skills

Benefits For System Development Manager, AWS Resilience, AWS Incident Response

  • Equal opportunities employer
  • Diverse and inclusive workplace

Interested in this job?