Acquia is seeking a Staff Site Reliability Engineer to play a key role in designing, implementing, and maintaining CI/CD pipelines, cloud infrastructure, and monitoring solutions. This hands-on position requires expertise in tools like ArgoCD, Kubernetes, and cloud-native architecture to achieve operational excellence at scale. The ideal candidate will work closely with engineering teams to ensure rapid, safe, and reliable deployments.
Key responsibilities include:
- Mastering CI/CD pipelines using tools like ArgoCD and Jenkins
- Building and managing scalable infrastructure with Terraform and Kubernetes
- Architecting cloud environments (AWS, GCP, or Azure) for optimal performance and cost
- Implementing comprehensive monitoring solutions with Prometheus, Grafana, ELK, and Datadog
- Championing DevOps culture and best practices across teams
- Focusing on building resilient systems and implementing Service Level Objectives (SLOs)
- Collaborating with security teams to implement robust security practices
- Working closely with product development teams to integrate CI/CD practices
Required skills:
- BS in Computer Science or equivalent experience
- Proficiency in languages like Go, Python, Ruby, PHP, Java, or JavaScript
- Strong Unix/Linux administration skills
- Expertise in CI/CD tools, Kubernetes, cloud platforms, and Infrastructure as Code
- Experience with monitoring and observability tools
- Security-focused mindset and excellent problem-solving abilities
Preferred qualifications:
- 8-13 years of hands-on DevOps or SRE experience
- Deep knowledge of ArgoCD or similar tools
- Strong scripting skills in Python, Go, or Bash
- Experience with service mesh architectures
- SRE Certification and Certified Kubernetes Administrator (CKA) are a plus
Join Acquia, a global leader in digital experience platforms, and be part of building the future of digital customer experiences.