Customers rely on Oracle Cloud Infrastructure (OCI) to power their business as they tackle some of the world's biggest challenges. We're looking for Senior Site Reliability Developers/Engineers who would be responsible for Advanced Operations (AO) and critical issues of production environments, including systems and databases, supporting critical business operations. Will perform administration and analysis for multiple production environments and recommend new and novel solutions to improve availability, performance, and supportability. This is an opportunity to bring a combination of deep technical knowledge with administration/analysis knowledge of Oracle's Cloud Infrastructure to provide critical issue support to a wide range of complex production environment problems related to immense growth, scaling, using the cloud, extremely high performance, and high availability requirements.
Responsibilities:
- Work in Advanced Operations (AO) on the Site Reliability Development/Engineering (SRD/SRE) team for US Gov Operations with shared full stack ownership of services and technology areas.
- Understand end-to-end configuration, technical dependencies, and behavioral characteristics of production services.
- Design and deliver critically important stack, focusing on security, resiliency, scale, and performance.
- Partner with development teams and Point Operations (PO) to improve service architecture.
- Act as the ultimate point for complex or critical issues not yet documented as Standard Operating Procedures (SOPs).
- Apply deep understanding of service topology and dependencies to solve issues and define mitigations.
Requirements:
- U.S. Citizenship
- Bachelor's Degree in Computer Science or other STEM related fields (M.S. an advantage)
- Experience with Linux (or any UNIX OS) System Administration, Networking, Storage, Compute, and Virtualization
- Strong familiarity with cloud concepts, platforms, distributed systems, and networking
- Strong Cybersecurity awareness/experience
- Experience in participating and leading incident bridges
- Customer obsession and passion for delighting customers
- Ability to quickly learn new technical domains and train others
Benefits:
- Competitive salary with exciting benefits
- Flexible and remote working options
- Learning and development opportunities
- Employee Assistance Program for mental health support
- Employee resource groups
- Core benefits including medical, life insurance, and retirement planning
- Inclusive culture
Work Schedule:
Monday through Friday core hours with on-call shift rotations for escalated incident management (10-15% of the year). Overtime is not generally expected but may occur in rare, extreme cases based on business needs.