Oracle Corporation's 'SaaS Engineering' team is setting up an exciting new team to work on advances in service reliability with teams of autonomous AI agents. This initiative aims to develop a robust system using advanced ML/AI tools to analyze system logs, predict failures, and autonomously resolve issues before they impact cloud services. The project combines the cutting-edge domains of anomaly detection and autonomous AI agents to enhance service resiliency.
Key Requirements: • 7-10 years of experience in machine learning engineering • Strong expertise in Python and deep learning frameworks (PyTorch, TensorFlow) • Experience with time-series data analysis, feature engineering, and model optimization • Knowledge of cloud services, containerization, and microservices architecture • Experience with AI agent frameworks and libraries • Expertise in anomaly detection systems for system logs • Advanced degree (Master's or Ph.D.) in Computer Science, Machine Learning, Data Science, or related field
Responsibilities: • Lead ML model development for detecting anomalies in system logs • Architect and implement scalable machine learning solutions • Optimize and fine-tune models for high performance and reliability • Mentor junior ML engineers and collaborate with cross-functional teams • Stay updated with the latest advancements in machine learning and AI • Integrate machine learning models into production environments • Ensure thorough documentation and regular reporting on project progress
This role offers a comprehensive benefits package including medical, dental, and vision insurance, 401(k) with company match, flexible vacation, paid parental leave, and more. The position is open to candidates in Redwood City, CA, Austin, TX, and Seattle, WA.