As a Site Reliability DevOps Engineer at Oracle, you will be responsible for defining and deploying key services with a deep focus on architecture, production operations, capacity planning, performance management, deployment, and release engineering. You will work with multiple cross-functional teams to deliver new and outstanding experiences to collaborators while ensuring reliability and performance.
Key responsibilities include:
- Understanding end-to-end configuration, technical dependencies, and behavioral characteristics of production services
- Designing and delivering mission-critical stack with focus on security, resiliency, scale, and performance
- Partnering with development teams to define and implement service architecture improvements
- Articulating technical characteristics of services and guiding Development Teams
- Acting as the ultimate escalation point for complex or critical issues
- Troubleshooting issues and defining mitigations using deep understanding of service topology
- Demonstrating clear understanding of automation and orchestration principles
The ideal candidate will have:
- 3 to 5+ years of experience as a Site Reliability Engineer or equivalent
- Strong background in Linux and cloud technologies
- Experience with container administration (Kubernetes, Docker, etc.)
- Proficiency in infrastructure automation (Terraform, Chef, Ansible, etc.)
- Experience with CI/CD pipelines and cloud orchestration frameworks
- Strong scripting skills (PowerShell, Bash, Python, etc.)
- Knowledge of fault-tolerant, highly available, distributed systems
This role offers the opportunity to work on cutting-edge cloud technologies and contribute to Oracle's innovative cloud solutions. Join a diverse and inclusive team committed to solving today's problems with tomorrow's technology.