Site Reliability Engineer (Remote)

Perlego is a company working to make education accessible to all by providing digital access to quality books and building a platform that helps students study smarter and more effectively.
$105,000
Site Reliability
Senior Software Engineer
Remote
51 - 100 Employees
5+ years of experience
Education · Enterprise SaaS

Description For Site Reliability Engineer (Remote)

At Perlego, we're on a mission to make education accessible to all. We believe that in the digital age, anyone should be able to learn anything at any time, without being hindered by high costs. Our team of over 100 people is working hard to support students across the UK & Europe in accessing quality books. Our next goals are to expand our support globally and build a product that goes beyond books, helping students study more effectively.

We're seeking an experienced Site Reliability Engineer (SRE) with strong expertise in AWS services and monitoring tools. In this role, you'll be crucial in ensuring the availability and reliability of our services, especially during out-of-office hours. You'll be responsible for swiftly addressing issues, resolving incidents independently, and thriving in our fast-paced environment.

Key responsibilities include:

  1. Monitoring & Incident Management:

    • Monitor platform activity using tools like Datadog, Prometheus, Grafana, or AWS CloudWatch.
    • Respond quickly to alerts and incidents, independently resolving issues during off-peak hours.
    • Conduct post-incident reviews and improve system resiliency.
  2. Cloud Infrastructure Management:

    • Manage AWS infrastructure, focusing on scalability, security, and reliability.
    • Handle deployments and manage CI/CD pipelines for containerized and serverless applications.
    • Ensure effective backup, recovery, and disaster recovery strategies.
  3. Collaboration & Communication:

    • Work with cross-functional teams to implement platform improvements.
    • Make swift decisions when managing service incidents outside core business hours.
    • Assist in platform security and compliance.
  4. Continuous Improvement:

    • Automate manual processes to reduce errors and improve efficiency.
    • Enhance monitoring systems for robust early detection and resolution.
    • Identify and address performance bottlenecks.

This role is ideal for someone with experience in Site Reliability Engineering or DevOps, strong AWS expertise, proficiency with monitoring tools, and experience with CI/CD pipelines. You should be skilled in Linux-based systems, Infrastructure as Code, and incident management. Strong communication skills and the ability to work independently across time zones are crucial.

Join us at Perlego and be part of a unique company culture that champions self-empowerment, personal development, and mutual support. We offer competitive compensation, learning and development opportunities, flexible work arrangements, and a range of benefits to support your well-being and work-life balance.

Last updated 2 months ago

Responsibilities For Site Reliability Engineer (Remote)

  • Monitor and manage platform activity using tools like Datadog, Prometheus, Grafana, or AWS CloudWatch
  • Respond quickly to alerts and incidents, independently resolving issues during off-peak hours
  • Manage and support AWS infrastructure, focusing on scalability, security, and reliability
  • Handle deployments and manage CI/CD pipelines for containerized and serverless applications
  • Collaborate with cross-functional teams to implement platform improvements
  • Assist in platform security, ensuring adherence to best practices for cloud security and compliance
  • Automate manual processes to reduce human error and improve efficiency
  • Continuously enhance monitoring systems and contribute to overall platform optimization

Requirements For Site Reliability Engineer (Remote)

Kubernetes
Linux
  • Experience in Site Reliability Engineering, DevOps, or a similar field
  • Strong experience with AWS services
  • Expertise in using monitoring tools (e.g. Prometheus, Grafana, CloudWatch)
  • Hands-on experience with CI/CD pipeline management
  • Proficiency in Linux-based operating systems and shell scripting
  • Familiarity with Infrastructure as Code tools (Terraform, CloudFormation)
  • Experience with incident management, troubleshooting, and platform recovery
  • Strong communication skills and ability to work independently across time zones

Benefits For Site Reliability Engineer (Remote)

Equity
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Assistance
Parental Leave
Education Budget
  • Competitive salary of CA$105,000
  • Share options
  • Personal L&D budget for online courses, subscriptions, or books
  • Unlimited access to MoreHappi, an on-demand professional coaching platform
  • Dedicated Learning Time
  • 30 days off (including bank holidays) + 1 additional day per year of service up to 35 days
  • Flexible bank holidays
  • Office Reset days between Boxing Day and New Year
  • Sabbatical opportunities
  • Personal days for life events
  • Private medical insurance
  • Competitive matched parental leave
  • Phased return to work from extended leave

Interested in this job?

Jobs Related To Perlego Site Reliability Engineer (Remote)

Site Reliability Engineer

Remote Site Reliability Engineer position at Perlego, ensuring system reliability and performance for an innovative digital education platform.

Site Reliability Engineer (Remote)

Join Perlego as a Site Reliability Engineer to ensure high availability and performance of our educational platform using AWS and monitoring tools.

Senior Site Reliability Engineer

Senior Site Reliability Engineer position at ZayZoon, focusing on AWS infrastructure, monitoring, and scaling for a growing FinTech platform.

Senior Site Reliability Engineer

Remote Senior Site Reliability Engineer position at ZayZoon, focusing on AWS infrastructure and cloud operations across Canadian locations.

Platform & Site Reliability Engineer

Senior Platform & Site Reliability Engineer role at PriceHubble, leading cloud architecture and DevOps practices for a growing real estate tech company.