Customer Reliability Engineer - Infra

Industry-leading data orchestration and observability platform powered by Airflow, accelerating building reliable data products.
Site Reliability
Mid-Level Software Engineer
Remote
3+ years of experience
Enterprise SaaS

Description For Customer Reliability Engineer - Infra

Astronomer is seeking a Customer Reliability Engineer to join their Infrastructure team, focusing on their managed Airflow service. This role combines technical expertise with customer service, making it ideal for engineers early in their careers who want to impact customer success directly.

The position involves managing and maintaining cloud infrastructure and Kubernetes clusters, ensuring platform reliability, and responding to customer incidents. As part of the Customer Reliability Engineering (CRE) team, you'll be responsible for operating, monitoring, and maintaining the platform to ensure availability and reliable operations.

The role offers unique exposure to diverse technical challenges across different cloud providers and industries. You'll work with cutting-edge technology in a distributed team environment, contributing to both technical solutions and customer success. The position requires a blend of technical skills in cloud infrastructure, Kubernetes, and distributed systems, along with strong customer service orientation.

Astronomer offers a globally-distributed, venture-backed environment focused on innovation and collaboration. They value diverse experiences and unconventional career paths, making it an excellent opportunity for those with non-traditional backgrounds who meet the core qualifications. The company is remote-first and committed to equal opportunity employment, welcoming candidates from all backgrounds.

Key technical requirements include experience with Kubernetes, cloud providers (AWS, GCP, Azure), Linux systems, and Python scripting. The role involves on-call rotations and requires strong troubleshooting abilities combined with excellent communication skills. This position offers growth opportunities in both technical and customer-facing aspects of platform engineering.

Last updated a day ago

Responsibilities For Customer Reliability Engineer - Infra

  • Provide solutions to customers for product success
  • Troubleshoot customer environments and engage in active triaging
  • Participate in on-call rotation for weekend coverage
  • Provide feedback to product development teams
  • Build out monitoring and alerting systems
  • Build and maintain automation for operational tasks
  • Help direct product architecture and contribute where possible
  • Own customer experience and provide white glove guidance
  • Enhance and enrich customer documentation
  • Work with latest technology and multi-cloud implementations

Requirements For Customer Reliability Engineer - Infra

Python
Kubernetes
Linux
  • 3-4 years of experience with large, complex SaaS infrastructures
  • 2 years of experience with Kubernetes
  • Experience managing Production distributed systems with major cloud provider (AWS, GCP, Azure)
  • Good network experience with major clouds
  • Good Linux experience
  • Knowledge of operating and monitoring distributed systems
  • Experience with observability tools
  • Previous experience handling customer issues
  • Good communication skills
  • DevOps or CI/CD experience
  • Python scripting
  • Good troubleshooting skills

Interested in this job?

Jobs Related To Astronomer Customer Reliability Engineer - Infra

Site Reliability Engineer

Site Reliability Engineer role at BlueConic, focusing on AWS infrastructure management and platform scalability for a leading customer data operating system.

Site Reliability Engineer (Broadcast Automation)

Remote Site Reliability Engineer position specializing in Broadcast Automation, focusing on system reliability, monitoring, and infrastructure maintenance.

Site Reliability Engineer

Site Reliability Engineer position at Electrum Payments, focusing on maintaining and improving cloud infrastructure reliability and performance for financial transaction systems.

Site Reliability Engineer II

Microsoft Azure Data group seeks Site Reliability Engineer II to ensure reliability of cloud database services, requiring 3+ years experience in software development and SQL support.

Internal SRE (Site Reliability Engineer)

Internal SRE position at KnowBe4 focusing on system reliability, GitLab management, and AWS infrastructure, offering competitive pay and benefits.