Oracle is seeking a Site Reliability Engineer specializing in AIOps to join their innovative team. This role represents a unique opportunity to shape cutting-edge AI Ops offerings from the ground up, working at the intersection of artificial intelligence and infrastructure reliability.
The position involves designing and implementing AI/ML solutions for cloud operations, focusing on improving system reliability, scalability, and efficiency. You'll be responsible for developing sophisticated algorithms for anomaly detection, predictive maintenance, and automated incident response, while working with large-scale monitoring and telemetry data.
As an SRE-AIOps engineer, you'll collaborate closely with cloud architects, data engineers, and SRE teams to transform how Oracle ensures reliability at scale. The role requires expertise in both traditional SRE practices and modern AI/ML technologies, making it ideal for candidates who want to be at the forefront of implementing AI-driven solutions in cloud operations.
Key technical requirements include strong experience with Python, AI/ML frameworks, and cloud platforms, combined with a deep understanding of SRE principles and practices. You'll work with cutting-edge tools and technologies, including Kubernetes, Prometheus, and various AI Ops platforms.
Oracle offers a comprehensive benefits package, including medical insurance, retirement options, and professional development opportunities. The company's commitment to innovation, combined with its global reach and established market position, makes this an excellent opportunity for someone looking to make a significant impact in the AIOps field.
The role offers the chance to work on challenging problems at scale, implement state-of-the-art solutions, and shape the future of cloud operations through AI innovation. If you're passionate about combining SRE practices with artificial intelligence to solve complex operational challenges, this position offers an exciting career opportunity.