Site Reliability Engineer – AIOps

A world leader in cloud solutions, using tomorrow's technology to tackle today's problems with 40+ years of experience.
Site Reliability
Staff Software Engineer
In-Person
5,000+ Employees
6+ years of experience
AI · Enterprise SaaS · Cloud

Description For Site Reliability Engineer – AIOps

Oracle is seeking a Site Reliability Engineer specializing in AIOps to join their innovative team. This role represents a unique opportunity to shape cutting-edge AI Ops offerings from the ground up, working at the intersection of artificial intelligence and infrastructure reliability.

The position involves designing and implementing AI/ML solutions for cloud operations, focusing on improving system reliability, scalability, and efficiency. You'll be responsible for developing sophisticated algorithms for anomaly detection, predictive maintenance, and automated incident response, while working with large-scale monitoring and telemetry data.

As an SRE-AIOps engineer, you'll collaborate closely with cloud architects, data engineers, and SRE teams to transform how Oracle ensures reliability at scale. The role requires expertise in both traditional SRE practices and modern AI/ML technologies, making it ideal for candidates who want to be at the forefront of implementing AI-driven solutions in cloud operations.

Key technical requirements include strong experience with Python, AI/ML frameworks, and cloud platforms, combined with a deep understanding of SRE principles and practices. You'll work with cutting-edge tools and technologies, including Kubernetes, Prometheus, and various AI Ops platforms.

Oracle offers a comprehensive benefits package, including medical insurance, retirement options, and professional development opportunities. The company's commitment to innovation, combined with its global reach and established market position, makes this an excellent opportunity for someone looking to make a significant impact in the AIOps field.

The role offers the chance to work on challenging problems at scale, implement state-of-the-art solutions, and shape the future of cloud operations through AI innovation. If you're passionate about combining SRE practices with artificial intelligence to solve complex operational challenges, this position offers an exciting career opportunity.

Last updated 9 days ago

Responsibilities For Site Reliability Engineer – AIOps

  • Design, build, and deploy AI/ML models for monitoring and telemetry data analysis
  • Develop algorithms for anomaly detection and root cause analysis
  • Implement AI-powered automation for incident management
  • Design data pipelines for monitoring and log data
  • Build dashboards and visualizations for AI-driven insights
  • Partner with SRE team for reliability initiatives
  • Research and implement state-of-the-art AI Ops tools
  • Mentor junior engineers in AI/ML methodologies

Requirements For Site Reliability Engineer – AIOps

Python
Kubernetes
  • 3+ years of experience in machine learning, data science, or AI-driven automation
  • Proficiency in Python, TensorFlow, PyTorch
  • Experience with cloud platforms (OCI, AWS, Azure, GCP)
  • Knowledge of cloud monitoring tools (Prometheus, Grafana)
  • Experience with large-scale data processing tools
  • SRE principles and incident response knowledge
  • Bachelor's or Master's degree in Computer Science or related field

Benefits For Site Reliability Engineer – AIOps

Medical Insurance
Vision Insurance
Dental Insurance
  • Flexible medical insurance
  • Life insurance
  • Retirement options
  • Volunteer programs

Interested in this job?

Jobs Related To Oracle Site Reliability Engineer – AIOps

Site Reliability Developer 4

Senior Site Reliability Developer position at Oracle focusing on cloud infrastructure and database management services.

Site Reliability Engineer 3

Senior Site Reliability Engineer role at Oracle Health, focusing on modernizing healthcare systems through AI and innovative technology.

Site Reliability Developer 4

Senior Site Reliability Engineer position at Oracle, focusing on cloud infrastructure, automation, and system reliability.

Site Reliability Developer 3

Senior Site Reliability Developer position at Oracle in Madrid, Spain, requiring 3-5+ years of experience in cloud infrastructure and systems reliability.

Site Reliability Developer 4

Senior Site Reliability Engineer role at Oracle focusing on cloud infrastructure and database services