Senior Software Architect - Deep Learning and HPC Communications

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Linux @ 7 Algorithms @ 4 TensorFlow @ 7 Communication @ 4 Networking @ 4 Parallel Programming @ 4 Debugging @ 4 System Architecture @ 7 PyTorch @ 7 CUDA @ 4 GPU @ 4

Details

NVIDIA's GPU Communications Libraries and Networking team builds communication libraries (NCCL, NVSHMEM, UCX) that are crucial for scaling Deep Learning and HPC. This role is for a Senior Software Architect to co-design next-generation data center platforms and scalable communications software that accelerate AI and HPC workloads.

Responsibilities

  • Investigate opportunities to improve communication performance by identifying bottlenecks in today’s systems.
  • Design and implement new communication technologies to accelerate AI and HPC workloads.
  • Explore innovative hardware and software solutions for next-generation platforms as part of co-design efforts with GPU, networking, and software architects.
  • Build proofs-of-concept, conduct experiments, and perform quantitative modeling to evaluate and drive new innovations.
  • Use simulation to explore performance of large GPU clusters (scales of hundreds to hundreds of thousands of GPUs).

Requirements

  • M.S. or Ph.D. degree in Computer Science, Computer Engineering, or equivalent experience.
  • 5+ years of relevant experience.
  • Excellent C/C++ programming and debugging skills.
  • Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
  • Deep understanding of operating systems, computer and system architecture.
  • Solid fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
  • Strong experience with Linux.
  • Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.

Ways to stand out

  • Expertise in related technology and passion for the domain; experience with CUDA programming and NVIDIA GPUs.
  • Knowledge of high-performance networks such as InfiniBand, RoCE, NVLink, and other interconnects.
  • Experience with Deep Learning frameworks (PyTorch, TensorFlow) and knowledge of deep learning parallelisms and mapping to the communication subsystem.
  • Experience with HPC applications and demonstrated ability to guide and influence in multi-functional teams.

Compensation and benefits

  • Base salary range:
    • Level 4: 184,000 USD - 287,500 USD (base salary determined by location, experience, and pay of employees in similar positions).
    • Level 5: 224,000 USD - 356,500 USD.
  • Eligible for equity and benefits (see NVIDIA benefits).

Additional information

  • Location: Santa Clara, CA, United States (Full time).
  • Applications accepted at least until August 13, 2025.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.