Focus
- Build infrastructure for distributed model training
- Optimize compute scheduling across large GPU fleets
- Improve performance of LLM training pipelines
Tech
- PyTorch Distributed
- Ray
- CUDA
- HPC networking (InfiniBand / RDMA)
Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.

