Site Reliability Engineer – AIOps

World leader in cloud solutions, using tomorrow's technology to tackle today's challenges. Operating for 40+ years with integrity and partnering with industry leaders across sectors.
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS · Cloud

Description For Site Reliability Engineer – AIOps

Oracle is seeking a Site Reliability Engineer specializing in AIOps to join their team. This role represents a unique opportunity to shape cutting-edge AI Ops offerings from the ground up. The position involves solving complex problems related to infrastructure cloud services and building automation to prevent problem recurrence. You'll be working at the intersection of AI and Site Reliability Engineering, designing and deploying software to improve the availability, scalability, and efficiency of Oracle products and services.

The role requires expertise in both AI/ML and cloud operations, with a focus on implementing AI-driven solutions for system reliability. You'll be collaborating with cloud architects, data engineers, and SRE teams to transform how Oracle ensures reliability at scale. The position involves working with large-scale monitoring systems, developing AI models for anomaly detection, and creating automated solutions for incident management.

As an IC5 level position, you'll be expected to bring 6-10+ years of experience and take on significant technical leadership responsibilities. The role offers the opportunity to work with cutting-edge technology in a global company with a 40+ year track record of innovation. Oracle provides competitive benefits and promotes work-life balance, making this an attractive opportunity for experienced SRE professionals looking to work at the forefront of AI Operations.

Last updated 2 hours ago

Responsibilities For Site Reliability Engineer – AIOps

  • Design, build, and deploy AI/ML models for analyzing large-scale monitoring and telemetry data
  • Develop algorithms for anomaly detection and predictive maintenance
  • Implement AI-powered automation for incident management
  • Design data pipelines for monitoring and log data
  • Build dashboards and visualizations for AI-driven insights
  • Partner with SRE team to align AI Ops initiatives
  • Integrate AI tools into observability platforms
  • Research and implement state-of-the-art AI Ops tools
  • Mentor junior engineers in AI/ML methodologies

Requirements For Site Reliability Engineer – AIOps

Python
Kubernetes
  • 3+ years experience in machine learning, data science, or AI-driven automation
  • Proficiency in Python, TensorFlow, PyTorch
  • Experience with cloud platforms (OCI, AWS, Azure, GCP)
  • Knowledge of cloud monitoring tools (Prometheus, Grafana)
  • Experience with large-scale data processing (Kafka, Spark)
  • SRE principles knowledge
  • Bachelor's or Master's degree in Computer Science or related field
  • English language proficiency

Benefits For Site Reliability Engineer – AIOps

Medical Insurance
  • Medical Insurance
  • Life Insurance
  • 401k
  • Volunteer programs

Interested in this job?

Jobs Related To Oracle Site Reliability Engineer – AIOps

Site Reliability Developer (JoinOCI-Ns2)

Site Reliability Developer position at Oracle focusing on cloud infrastructure, requiring expertise in Linux systems, automation, and large-scale distributed systems.

Site Reliability Developer (JoinOCI-Ns2)

Site Reliability Developer position at Oracle focusing on cloud infrastructure, requiring 5+ years experience and active security clearance, offering comprehensive benefits and competitive salary.

Senior Site Reliability Development Engineer

Senior SRE role at Oracle Cloud Infrastructure focusing on government cloud operations, requiring strong Linux and cloud expertise with 6+ years experience.

Senior Site Reliability Development Engineer

Senior SRE role at Oracle Cloud Infrastructure focusing on government cloud operations, automation, and system reliability in Singapore.

Senior Site Reliability Development Engineer

Senior Site Reliability Engineer role at Oracle Cloud Infrastructure, focusing on government cloud operations in Singapore with emphasis on automation and system optimization.