Senior HPC And AI Network Software Architect

at Nvidia
PLN 221,200-507,000 per year
SENIOR
✅ On-site

Used Tools & Technologies

LLM

Required Skills & Competences

Python @ 7 TensorFlow @ 4 Hiring @ 4 Communication @ 4 Networking @ 6 HTTP @ 4 PyTorch @ 4 CUDA @ 7 GPU @ 7 AI @ 4 NCCL @ 4 HPC @ 4 JAX @ 4

Details

NVIDIA is hiring to build the next generation of scalable AI infrastructure focused on distributed training, real-time inference, and communication efficiency across large systems. The role involves designing software and hardware approaches, shaping platform evolution, and collaborating with researchers and engineers to deliver systems that power large-scale AI workloads.

Responsibilities

  • Build and evolve the architecture of scalable software systems for distributed AI training and inference, focusing on throughput, latency, resiliency, and memory efficiency across cluster-scale deployments.
  • Develop and evaluate next-generation communication and runtime capabilities in libraries such as NCCL, UCX, and UCC, tailored to frontier AI workloads.
  • Partner with AI framework teams (e.g., TensorFlow, PyTorch, JAX) and internal platform teams to build integrations, explore new approaches, and improve end-to-end performance and reliability.
  • Collaborate on hardware and system-level features across GPUs, DPUs, and interconnects to speed up data movement and enable new capabilities for training, inference, and model serving at scale.
  • Drive innovation across runtime systems, communication libraries, and AI-specific protocol layers, turning ideas into practical capabilities and robust implementations.

Requirements

  • Ph.D. or equivalent industry experience in computer science, computer engineering, or a closely related field.
  • 5+ years of experience in systems programming, parallel or distributed computing, high-performance networking, or large-scale data movement, including experience designing and building complex systems.
  • Strong programming background in C++, Python, and ideally CUDA or other GPU programming models, with a track record of building production-quality performance-critical software.
  • Extensive hands-on experience with AI frameworks (e.g., PyTorch, TensorFlow, JAX) and a solid grasp of how communication libraries and runtime systems facilitate large-scale training and inference.
  • Demonstrated success in developing and refining high-throughput, low-latency systems, with the ability to reason across software stacks, hardware capabilities, and system bottlenecks.
  • Strong collaboration skills in a multi-national, interdisciplinary setting, able to work effectively with senior engineers, researchers, and partner teams.

Ways to Stand Out

  • Deep expertise with NCCL, UCX, UCC, or similar communication libraries used in large-scale AI and HPC workloads.
  • Strong background in networking and communication protocols, RDMA, collective communications, congestion-aware transport, or accelerator-aware networking.
  • Comprehensive knowledge of large model training and inference serving at scale, including communication bottlenecks, scheduling challenges, and system-level tradeoffs across compute, memory, and fabric.
  • Experience in hardware-software co-design for distributed AI systems, including contributions that advanced GPU, DPU, interconnect, or runtime capabilities.
  • Familiarity with infrastructure for deployment of LLMs or transformer-based models, including sharding, pipelining, expert parallelism, or hybrid parallelism.

Compensation & Benefits

  • Base salary ranges (specified for Poland): 221,250 PLN - 383,500 PLN for Level 3, and 292,500 PLN - 507,000 PLN for Level 4.
  • NVIDIA states base salary is determined by location, experience, and similar roles; company offers a comprehensive benefits package (see http://www.nvidiabenefits.com/).