As a Site Reliability Engineer (SRE) at Together AI, you will be responsible for maintaining all user-facing services and production systems. This role combines pragmatic operations with software engineering, applying sound engineering principles, operational discipline, and mature automation to our operating environments and codebase.
You will specialize in systems (operating systems, storage subsystems, networking) while implementing best practices for availability, reliability, and scalability. Your varied interests in algorithms and distributed systems will be valuable in this role.
Key responsibilities include:
Together AI is at the forefront of AI research and development, contributing to open-source research, models, and datasets. The company has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. This role offers an opportunity to join a passionate team of researchers and engineers in building the next generation of AI infrastructure.
The position offers competitive compensation, including a base salary range of $160,000 - $230,000, startup equity, health insurance, and other benefits. Together AI is an Equal Opportunity Employer, providing equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
If you're passionate about AI infrastructure and have the skills to keep complex systems running smoothly at scale, this role at Together AI could be an excellent opportunity to make a significant impact in the field of artificial intelligence.