Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Algorithms @ 4
Leadership @ 4
Technical Proficiency @ 6
Communication @ 4
Networking @ 6
LLM @ 7
PyTorch @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 8
AI @ 4
InfiniBand @ 4
Agentic AI @ 4
vLLM @ 7
NCCL @ 4
TensorRT @ 7
SGLang @ 7
HPC @ 4
NVLink @ 4
JAX @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.
Responsibilities
- Architecture leadership: define the long-term technical roadmap for communication libraries across NVIDIA's next-generation platforms and ensure seamless scaling of models to clusters comprising hundreds of thousands of nodes.
- AI communication library design: lead development of next-generation communication primitives and collective algorithms, optimizing for heterogeneous interconnects such as NVLink, Spectrum-X (Ethernet), and Quantum-X (InfiniBand).
- Application–communication library co-design: partner with application developers to architect and implement specialized communication primitives and ensure AI and HPC libraries (including NCCL, NIXL, NVSHMEM, UCC, and UCX) evolve to meet requirements of trillion-parameter and Agentic AI.
- Hardware/software co-design: collaborate with silicon architects and software engineers to influence hardware specifications for next-generation networking to meet demands of trillion-parameter LLMs and Agentic AI.
- Quantitative modeling: develop high-fidelity analytical models and simulators to predict system behavior under emerging workloads.
Requirements
- Ph.D. or M.S. in Computer Science, Electrical Engineering, or a related field (or equivalent experience), with 12+ years of industry experience in high-performance computing (HPC) or distributed deep learning.
- Parallelism expertise: deep understanding of 3D parallelism (Data, Tensor, Pipeline) and advanced strategies including Context Parallelism, Expert Parallelism, and Zero Redundancy Optimizer (ZeRO) variants.
- Technical proficiency with communication runtimes and libraries such as NCCL, UCX, UCC, NVSHMEM, or MPI.
- Experience with RDMA, RoCE, and low-level InfiniBand verbs (required).
- Advanced knowledge of high-throughput inference engines and schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo.
- Expert knowledge of NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache) and CUDA programming models.
Ways to Stand Out
- Hands-on framework development experience with Megatron-Core, DeepSpeed, or JAX/XLA and understanding how these frameworks interact with low-level communication runtimes.
- Significant upstream contributions to major open-source projects (e.g., PyTorch Distributed, KServe, or Ray).
- Proven track record of deploying and optimizing models on NVIDIA platforms or similar rack-scale systems.
- Strong portfolio of patents or papers in top-tier systems/architecture venues (e.g., ISCA, ASPLOS, NeurIPS, SC).
Compensation & Benefits
- Base salary range: 272,000 USD - 431,250 USD (base salary determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits.
Other Information
- Applications for this job will be accepted at least until April 18, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.