Meta (Facebook)
Software Engineering Manager, AI Networking
πMenlo Park, CA
5 months ago
π 1 views
π₯ 0 clicked apply
In this role, you will be a member of the Network AI Software team and part of the bigger DC networking organization. The team develops and owns the software stack around collective communication libraries around Meta.
At the high level, the team aims to enable Meta-wide ML products and innovations to leverage our large-scale training and inference fleet through an observable, reliable and high-performance distributed AI communication stack. Currently, one of the teamβs focus is on building customized features, SW benchmarks, performance tuners and SW stacks around PyTorch to improve the full-stack distributed ML reliability and performance (e.g. Large-Scale GenAI/LLM training) from the trainer down to the network communication layer. And we are seeking for leaders to work on the space of GenAI/LLM scaling reliability and performance.