Site Reliability Engineer

As a world leader in cloud solutions, Oracle uses tomorrow's technology to tackle today's problems. True innovation starts with diverse perspectives and various abilities and backgrounds. When everyone's voice is heard, we're inspired to go beyond what's been done before. It's why we're committed to expanding our inclusive workforce that promotes diverse insights and perspectives. We've partnered with industry-leaders in almost every sector—and continue to thrive after 40+ years of change by operating with integrity. Oracle careers open the door to global opportunities where work-life balance flourishes. We offer a highly competitive suite of employee benefits designed on the principles of parity and consistency. We put our people first with flexible medical, life insurance and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We're committed to including people with disabilities at all stages of the employment process. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law.
Costa Rica
Site Reliability
Mid-Level Software Engineer
Hybrid
5,000+ Employees
3+ years of experience
Enterprise SaaS

Description For Site Reliability Engineer

Site Reliability Engineer at Oracle works with all deployments of the NetSuite application and ensures uptime, performance, and reliability of all customers in our production environments as we expand into all regions of Oracle Cloud Infrastructure. This is a hybrid role.

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

Responsibilities:

  • Ensure Oracle Netsuite NSGBU Cloud Operations systems are operational
  • Collaborate with other engineering teams to support, design, and implement tooling and automation platforms
  • Resolve site incidents on various levels of infrastructure
  • Work with monitoring and analytic tools like Kibana, Icinga, and Prometheus/Grafana
  • Participate in NSGBU SRE 24x7 Follow the Sun Operational coverage

Qualifications:

  • Knowledge of Linux systems internals, monitoring, networking, and core cloud concepts
  • Understanding of web technologies, Apache, HTTPS/SSL, Web sessions
  • Knowledge of database environments and high-availability environments
  • Excellent analytical and troubleshooting skills
  • BS in Computer Science or a related field
  • Minimum 2 years experience in large-scale production operations environment
  • Scripting knowledge in Bash, Perl, Python, or similar
  • Database knowledge (Oracle, NoSQL such as Cassandra and Redis)
  • Knowledge of orchestration and configuration management tools
  • Exposure to distributed platforms like Gluster FS, Zookeeper, Kafka, ElasticSearch

This role requires excellent communication skills in English and the ability to work quickly and accurately under pressure in time-critical situations.

Last updated 17 days ago

Responsibilities For Site Reliability Engineer

  • Ensure Oracle Netsuite NSGBU Cloud Operations systems are operational
  • Collaborate with other engineering teams to support, design, and implement tooling and automation platforms
  • Resolve site incidents on various levels of infrastructure
  • Work with monitoring and analytic tools like Kibana, Icinga, and Prometheus/Grafana
  • Participate in NSGBU SRE 24x7 Follow the Sun Operational coverage
  • Design, write, and deploy software to improve availability, scalability, and efficiency of Oracle products and services
  • Facilitate service capacity planning, demand forecasting, software performance analysis, and system tuning
  • Act as ultimate escalation point for complex or critical issues

Requirements For Site Reliability Engineer

Linux
Kubernetes
Python
Redis
Kafka
  • BS in Computer Science or a related field
  • Minimum 2 years experience in large-scale production operations environment
  • Knowledge of Linux systems internals, monitoring, networking, and core cloud concepts
  • Understanding of web technologies, Apache, HTTPS/SSL, Web sessions
  • Knowledge of database environments and high-availability environments
  • Excellent analytical and troubleshooting skills
  • Scripting knowledge in Bash, Perl, Python, or similar
  • Database knowledge (Oracle, NoSQL such as Cassandra and Redis)
  • Knowledge of orchestration and configuration management tools
  • Exposure to distributed platforms like Gluster FS, Zookeeper, Kafka, ElasticSearch
  • Excellent communication skills in English

Benefits For Site Reliability Engineer

Medical Insurance
Vision Insurance
  • Flexible medical options
  • Life insurance
  • Retirement options
  • Volunteer programs

Interested in this job?

Jobs Related To Oracle Site Reliability Engineer

Site Reliability Developer 3

Site Reliability Developer 3 at Oracle, solving complex cloud infrastructure problems and improving system efficiency.

Site Reliability Developer 3

Site Reliability Developer 3 position at Oracle, focusing on solving complex infrastructure cloud service problems and improving product scalability and efficiency.

Site Reliability Developer 2

Oracle is seeking a Site Reliability Developer 2 to design, implement, and maintain large-scale distributed systems for their cloud solutions.

Site Reliability Developer Opportunities - Mexico

Oracle is hiring Site Reliability Developers in Mexico to work on cloud infrastructure and large-scale distributed systems.

Site Reliability Developer 3

Oracle is hiring a Site Reliability Developer 3 to solve complex infrastructure problems and improve cloud services efficiency.