Microsoft's Azure Machine Learning team is seeking a Principal Software Engineer to join their Inference team, working directly with OpenAI to host models efficiently on Azure. This role is at the forefront of serving millions of requests per day for Microsoft and 3P Copilots. The position focuses on optimizing Large Language Models (LLMs) and Diffusion models for inference at high scale and low latency.
The successful candidate will work with cutting-edge hardware and software stacks, engaging directly with key partners to implement complex inferencing capabilities. This role offers the opportunity to work on one of the largest GPU fleets in the world, supporting critical AI infrastructure that powers various Copilot services.
The position requires strong expertise in C++ and Python, with additional value placed on Rust and CUDA experience. You'll be part of Microsoft's mission to democratize ML and make it available to every enterprise, developer, and data scientist. The role offers competitive compensation, comprehensive benefits, and the chance to work with state-of-the-art AI technology.
Working in a fast-paced, startup-like environment, you'll collaborate with top talent in the field of AI and machine learning, while having the stability and resources of a major tech company. This is an excellent opportunity for someone passionate about AI infrastructure and optimization who wants to make a significant impact on the future of AI technology deployment.