Senior Deep Learning Communication Architect

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 7 Algorithms @ 4 Communication @ 4 LLM @ 4 PyTorch @ 6 CUDA @ 3 GPU @ 3

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, we are tapping into the unlimited potential of AI to define the next era of computing. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

Responsibilities

  • Scale DNN models and training/inference frameworks to systems with hundreds of thousands of nodes.
  • Identify and eliminate bottlenecks in data transfer and synchronization during distributed deep learning training and inference to optimize communication performance.
  • Design and implement efficient communication algorithms and protocols tailored for deep learning workloads to minimize communication overhead and latency.
  • Collaborate with hardware and software teams on hardware/software co-craft to effectively apply high-speed interconnects (e.g., NVLink, InfiniBand, SPC-X) and communication libraries (e.g., MPI, NCCL, UCX, UCC, NVSHMEM).
  • Research, evaluate, and explore new communication technologies and techniques to enhance performance and scalability of deep learning systems.
  • Build proofs-of-concept, conduct experiments, and perform quantitative modeling to validate and deploy new communication strategies.

Requirements

  • A Ph.D., Masters, or BS in Computer Science (CS), Electrical Engineering (EE), Computer Science and Electrical Engineering (CSEE), or a closely related field or equivalent experience.
  • 6+ years of experience in building DNNs, scaling DNNs, parallelism of DNN frameworks, or deep learning training and inference workloads.
  • Experience evaluating, analyzing, and optimizing LLM training and inference performance of state-of-the-art models on cutting-edge hardware.
  • Deep understanding of parallelism techniques, including Data Parallelism, Pipeline Parallelism, Tensor Parallelism, Expert Parallelism, and FSDP.
  • Understanding of emerging serving architectures like Disaggregated Serving and inference servers such as Dynamo and Triton.
  • Proficiency developing code for one or more DNN training and inference frameworks, such as PyTorch, TensorRT-LLM, vLLM, SGLang.
  • Strong programming skills in C++ and Python.
  • Familiarity with GPU computing (including CUDA and OpenCL) and with high-performance networks including InfiniBand and RoCE.

Ways to Stand Out

  • Prior contributions to one or more DNN training and inference frameworks.
  • Deep understanding and contributions to scaling LLMs on large-scale systems.

Compensation & Benefits

  • Base salary range: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. Your base salary will be determined based on your location, experience, and pay of employees in similar positions.
  • You will also be eligible for equity and benefits (see NVIDIA benefits).

Additional Information

  • Applications accepted at least until October 13, 2025.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.