Principal Architect, AI Networking

at Nvidia

📍 Santa Clara, United States

USD 272,000-431,200 per year

SENIOR

✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 6 Communication @ 4 Networking @ 4 Rust @ 6 LLM @ 4 CUDA @ 4 GPU @ 4 AI @ 4 InfiniBand @ 8 vLLM @ 4 NCCL @ 8 TensorRT @ 4 SGLang @ 4 NVLink @ 4

Details

An applied research team within NVIDIA’s Networking Systems & Software Architecture group is solving some of AI’s hardest infrastructure problems. The team builds systems-level software that moves data between GPUs, nodes, and storage at the speed modern AI demands—spanning low-level transport optimization, hardware-software co-design, and communication frameworks that plug directly into production AI stacks. The team's charter expands into emerging domains including quantum computing interconnects.

This Principal Architect role leads the research agenda and architectural direction for how NVIDIA’s AI systems communicate at scale—across GPUs, DPUs, NICs, and heterogeneous storage. It requires someone who defines project scope from scratch, publishes original work, and translates research breakthroughs into production-grade software that ships industry-wide.

Responsibilities

Set the long-term technical vision for distributed AI communication systems—GPU-to-GPU, GPU-to-storage, and cross-node data movement.
Conduct original research and prototype next-generation networking solutions over RDMA, NVLink, and GPUDirect.
Drive hardware-software co-optimization with GPU, DPU, NIC, and network switch; investigate fundamental bottlenecks in communication runtimes for large-scale AI workloads (KV cache transfer, disaggregated prefill/decode, model parallelism).
Integrate networking capabilities into AI serving stacks such as vLLM, SGLang, and TensorRT-LLM.
Publish findings, represent NVIDIA in industry forums and standards bodies, and mentor senior engineers across the organization.

Requirements

15+ years in systems software and/or networking with deep expertise in high-performance networking (InfiniBand, RoCE, RDMA, NVLink), communication libraries (e.g., NIXL, NCCL, UCX, MPI, NVSHMEM), and GPU-accelerated systems.
Track record of defining and delivering complex, cross-team technical initiatives from research concept to production.
MS, PhD or equivalent experience in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
Deep understanding of computer architecture, memory hierarchies, DMA engines, and OS-level networking.
Understanding of ML systems concepts—transformer architectures, KV cache mechanics, model parallelism, or distributed training and inference patterns.
Proficiency in programming languages such as C, C++, Rust, and Python.

Ways to stand out

Knowledge of ML inference frameworks (vLLM, SGLang, TensorRT-LLM) and their communication requirements.
CUDA programming and NVIDIA GPU architecture expertise.
Proven experience influencing product strategy and technical roadmap at a senior level.
Major open-source contributions.

Compensation & Other Details

Base salary range: 272,000 USD - 431,250 USD (determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and benefits.
Applications accepted at least until April 27, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.