Senior Deep Learning Communication Architect

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 7 Communication @ 4 LLM @ 4 PyTorch @ 6 CUDA @ 3 GPU @ 3

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years, fueling innovation with great technology and amazing people. Today, NVIDIA pioneers AI computing with GPUs that power computers, robots, and self-driving cars capable of understanding the world.

Responsibilities

  • Scale deep neural network (DNN) models and training/inference frameworks to systems with hundreds of thousands of nodes.
  • Optimize communication performance by identifying and eliminating bottlenecks in data transfer and synchronization during distributed deep learning training and inference.
  • Design efficient communication protocols tailored for deep learning workloads to minimize communication overhead and latency.
  • Collaborate with hardware and software teams to co-craft systems that leverage high-speed interconnects (e.g., NVLink, InfiniBand, SPC-X) and communication libraries (e.g., MPI, NCCL, UCX, UCC, NVSHMEM).
  • Research and evaluate new communication technologies and techniques to enhance performance and scalability of deep learning systems.
  • Build proofs-of-concept, conduct experiments, and perform quantitative modeling to validate and deploy new communication strategies.

Requirements

  • Ph.D., Masters, or BS in Computer Science, Electrical Engineering, Computer Science and Electrical Engineering, or a closely related field, or equivalent experience.
  • 6+ years of experience in building, scaling, and parallelizing DNNs, or working on deep learning training and inference workloads.
  • Experience evaluating, analyzing, and optimizing LLM training and inference performance on state-of-the-art hardware.
  • Deep understanding of parallelism techniques, including Data Parallelism, Pipeline Parallelism, Tensor Parallelism, Expert Parallelism, and Fully Sharded Data-Parallel (FSDP).
  • Understanding of emerging serving architectures such as Disaggregated Serving and inference servers like Dynamo and Triton.
  • Proficiency in developing code for deep neural network training and inference frameworks such as PyTorch, TensorRT-LLM, vLLM, and SGLang.
  • Strong programming skills in C++ and Python.
  • Familiarity with GPU computing including CUDA and OpenCL, and networks such as InfiniBand and RoCE.

Ways to Stand Out

  • Prior contributions to one or more DNN training and inference frameworks.
  • Deep understanding and contributions to the scaling of large language models (LLMs) on large-scale systems.

With competitive salaries and a generous benefits package, NVIDIA is considered one of the most desirable tech employers. The engineering teams are rapidly growing, seeking creative, autonomous engineers passionate about technology and cloud services.

Salary

The base salary range is 184,000 USD to 356,500 USD, determined by location, experience, and internal equity. Additionally, employees are eligible for equity and benefits.

NVIDIA is an equal opportunity employer committed to diversity and inclusion in every aspect of employment.