AI & Machine Learning Site Reliability Engineer

Oomnitza provides Enterprise Technology Management platform that orchestrates and automates key business processes for IT through SaaS solutions.
Galway, Ireland
Site Reliability
Senior Software Engineer
Remote
5+ years of experience
AI · Enterprise SaaS

Description For AI & Machine Learning Site Reliability Engineer

Oomnitza, a leading Enterprise Technology Management platform provider, is seeking an AI & ML Site Reliability Engineer to drive their AI and Data product management innovations. This role combines site reliability engineering with AI/ML expertise, focusing on building and maintaining infrastructure for machine learning operations. The position offers an opportunity to work with cutting-edge technologies including vector databases, knowledge graphs, and large language models. You'll be responsible for architecting scalable AI systems, implementing RAG solutions, and ensuring robust ML model deployment pipelines. The role requires strong technical expertise in cloud platforms, containerization, and ML frameworks, combined with the ability to collaborate across teams. Working in a venture-backed company with a progressive culture, you'll have the chance to shape the future of enterprise technology management while enjoying comprehensive benefits and flexible work arrangements. The position offers significant growth potential, working directly with founders and helping scale a fast-growing business backed by notable investors.

Last updated 3 days ago

Responsibilities For AI & Machine Learning Site Reliability Engineer

  • Build and maintain big data analytics platform
  • Design and build scalable AI infrastructure
  • Implement and manage vector databases and knowledge graphs
  • Develop retrieval-augmented generation systems
  • Train and optimize large language models
  • Deploy, manage, and monitor ML models in production
  • Implement CI/CD processes for machine learning
  • Develop and manage AI agents for task automation
  • Ensure model performance monitoring and governance
  • Collaborate with data scientists and cross-functional teams

Requirements For AI & Machine Learning Site Reliability Engineer

Python
Kubernetes
  • Bachelor's degree in Computer Science, Engineering, Data Science, or related field
  • 5+ years of experience in site reliability engineering, dev ops, ML Ops
  • Experience with cloud platforms (AWS, GCP, Azure)
  • Proficient in deploying machine learning models
  • Experience with data processing tools
  • Strong understanding of vector databases and knowledge graph tools
  • Experience with containerization and orchestration technologies
  • Proficiency in Python and ML tools
  • Experience in on-call incident response
  • Excellent communication and collaboration skills

Benefits For AI & Machine Learning Site Reliability Engineer

Dental Insurance
Vision Insurance
Medical Insurance
Equity
  • Healthcare for dependents and spouse
  • Dental & Vision Insurance
  • Employee equity plan
  • Pension, Life insurance and Income protection
  • Remote working & flexible work schedules
  • Working from home equipment allowance
  • Choice of preferred equipment (Mac or PC)
  • Regular social events and workshops

Interested in this job?

Jobs Related To Oomnitza AI & Machine Learning Site Reliability Engineer

Site Reliability Engineer L4/L5 - Live Cloud Platform SRE

Senior Site Reliability Engineer position at Netflix focusing on cloud platform reliability for live streaming events, offering competitive compensation and comprehensive benefits.

Senior Site Reliability Engineer - CTJ - POLY

Senior Site Reliability Engineer role at Microsoft working on Azure SQL services for government clouds, requiring security clearance and distributed systems expertise.

Senior Site Reliability Engineer - AI Research Clusters

Senior Site Reliability Engineer position at NVIDIA focusing on AI research clusters, requiring 5+ years of experience in large-scale infrastructure and GPU computing.

Site Reliability Engineer, Managed Operations

Senior Site Reliability Engineer role at AWS Berlin, focusing on launching and managing the European Sovereign Cloud infrastructure and services.

Site Reliability Engineer - GovCloud - Rotating Shift

Site Reliability Engineer position at Salesforce focusing on GovCloud infrastructure maintenance, incident response, and system reliability for government customers.