Aethir, the leading Enterprise-grade AI-focused GPU-as-a-service provider, is seeking a Site Reliability Engineer to join their team in Kuala Lumpur, Malaysia. The company operates a revolutionary decentralized cloud computing infrastructure, managing over 40,000 top-shelf GPUs, including 3,000 NVIDIA H100s, to deliver enterprise-grade GPU computing solutions globally.
Backed by prominent Web3 investors and having raised over $130M in ecosystem funding, Aethir stands at the forefront of decentralized computing innovation. This role presents a unique opportunity to work with cutting-edge technology in a rapidly growing environment.
As an SRE, you'll be instrumental in ensuring the reliability and performance of Aethir's production systems. Your responsibilities will span from monitoring and troubleshooting to system optimization, directly impacting the service quality for AI and gaming customers worldwide. You'll work with modern technologies including Kubernetes, Docker, and cloud platforms, while collaborating with cross-functional teams to resolve complex technical challenges.
The ideal candidate brings a strong technical foundation in systems architecture and performance monitoring, combined with excellent problem-solving abilities. You'll thrive in a fast-paced startup environment where your actions directly influence the platform's success. The role offers significant growth potential, with opportunities to work alongside global teams and contribute to innovative projects in the AI and cloud computing space.
Join Aethir to be part of a transformative journey in decentralized computing, where your expertise will help shape the future of GPU-as-a-service technology. The position offers competitive benefits, including career advancement opportunities and a collaborative work environment focused on innovation and excellence.