Senior Software Architect - Deep Learning and HPC Communications

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Linux @ 7 Algorithms @ 4 TensorFlow @ 7 Communication @ 4 Networking @ 4 Parallel Programming @ 4 Debugging @ 4 System Architecture @ 7 PyTorch @ 7 CUDA @ 4 GPU @ 4

Details

NVIDIA's GPU Communications Libraries and Networking team builds communication libraries (NCCL, NVSHMEM, UCX) that are crucial for scaling Deep Learning and HPC. This role is for a Senior Software Architect to co-design next-generation data center platforms and scalable communications software that accelerate AI and HPC workloads.

Responsibilities

Investigate opportunities to improve communication performance by identifying bottlenecks in today’s systems.
Design and implement new communication technologies to accelerate AI and HPC workloads.
Explore innovative hardware and software solutions for next-generation platforms as part of co-design efforts with GPU, networking, and software architects.
Build proofs-of-concept, conduct experiments, and perform quantitative modeling to evaluate and drive new innovations.
Use simulation to explore performance of large GPU clusters (scales of hundreds to hundreds of thousands of GPUs).

Requirements

M.S. or Ph.D. degree in Computer Science, Computer Engineering, or equivalent experience.
5+ years of relevant experience.
Excellent C/C++ programming and debugging skills.
Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
Deep understanding of operating systems, computer and system architecture.
Solid fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
Strong experience with Linux.
Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.

Ways to stand out

Expertise in related technology and passion for the domain; experience with CUDA programming and NVIDIA GPUs.
Knowledge of high-performance networks such as InfiniBand, RoCE, NVLink, and other interconnects.
Experience with Deep Learning frameworks (PyTorch, TensorFlow) and knowledge of deep learning parallelisms and mapping to the communication subsystem.
Experience with HPC applications and demonstrated ability to guide and influence in multi-functional teams.

Compensation and benefits

Base salary range:
- Level 4: 184,000 USD - 287,500 USD (base salary determined by location, experience, and pay of employees in similar positions).
- Level 5: 224,000 USD - 356,500 USD.
Eligible for equity and benefits (see NVIDIA benefits).

Additional information

Location: Santa Clara, CA, United States (Full time).
Applications accepted at least until August 13, 2025.
NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.