Senior GPU Networking Architect

at Nvidia

📍 Zurich, Switzerland

PLN 292,500-650,000 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Hiring @ 4 Communication @ 4 Networking @ 4 API @ 4 LLM @ 3 PyTorch @ 3 CUDA @ 6 GPU @ 4 Deep Learning @ 4 AI @ 4 InfiniBand @ 4 vLLM @ 3 NCCL @ 4 TensorRT @ 3 NVLink @ 4

Details

NVIDIA is hiring a Senior GPU Networking Architect to join the networking software group and build and improve GPU communication kernels that link GPU computing with networking. The role focuses on developing GPU-resident communication primitives and device-side APIs, optimizing kernel efficiency and latency, and collaborating with software, hardware, and AI framework teams to co-design communication strategies for large-scale AI systems.

Responsibilities

Build, implement, and optimize GPU communication kernels that underpin collective and point-to-point operations in large-scale AI systems.
Leverage deep knowledge of GPU architecture (thread scheduling, memory hierarchy, execution pipelines) to improve kernel efficiency, minimize latency, and overlap computation with communication.
Develop GPU-resident communication primitives and device-side APIs enabling fine-grained, kernel-initiated data movement across nodes and accelerators.
Profile and tune GPU kernels end-to-end; identify bottlenecks at the intersection of compute, memory, and network, and drive targeted optimizations.
Collaborate with network software, hardware, and AI framework teams to co-design communication strategies aligned with GPU execution patterns and emerging model architectures.
Build proofs-of-concept, conduct experiments, and perform quantitative modeling to evaluate and validate new communication strategies before committing them to production.
Contribute to the evolution of programming models that expose GPU-aware networking capabilities to application developers.

Requirements

5+ years of hands-on CUDA programming, including writing and optimizing non-trivial GPU kernels.
M.Sc. or equivalent experience in computer science, computer engineering, or a closely related field.
Strong understanding of GPU architecture fundamentals: warp scheduling, shared memory, L2 cache, memory coalescing, occupancy tuning, and asynchronous execution.
Experience with systems-level C/C++ development in performance-critical environments.
Familiarity with GPU data movement mechanisms such as GPUDirect RDMA and GPU-initiated communication.
Ability to read and reason about GPU performance profiles (e.g., Nsight Compute, Nsight Systems) and translate observations into actionable optimizations.
Strong collaboration skills in a multi-national, interdisciplinary environment.

Preferred / Ways to stand out

Experience developing or optimizing communication kernels in libraries such as NCCL, NVSHMEM, or similar GPU-aware communication frameworks.
Understanding of distributed deep learning parallelism techniques (data, tensor, pipeline, expert parallelism, and mixture-of-experts) and the communication patterns they impose on GPU kernels.
Background in RDMA, InfiniBand, high-speed networking, and GPU system topology (NVLink, NVSwitch, PCIe) and their impact on communication kernel design.
Experience with overlap techniques such as kernel pipelining, persistent kernels, or cooperative groups to hide communication latency behind compute.
Proven experience evaluating and optimizing large-scale LLM training or inference workloads, including hands-on work with frameworks such as PyTorch, TensorRT-LLM, or vLLM, and familiarity with emerging serving architectures such as disaggregated serving.

About compensation and benefits

NVIDIA offers highly competitive salaries and a comprehensive benefits package. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

For Poland: The base salary range is 292,500 PLN - 507,000 PLN for Level 4, and 375,000 PLN - 650,000 PLN for Level 5.

More on benefits: www.nvidiabenefits.com/