Senior Software Architect - Deep Learning and HPC Communications
at Nvidia
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Linux @ 7 Algorithms @ 4 TensorFlow @ 7 Communication @ 4 Networking @ 4 Parallel Programming @ 4 Debugging @ 4 System Architecture @ 7 PyTorch @ 7 CUDA @ 4 GPU @ 4Details
NVIDIA's GPU Communications Libraries and Networking team builds communication libraries (NCCL, NVSHMEM, UCX) that are crucial for scaling Deep Learning and HPC. This role is for a Senior Software Architect to co-design next-generation data center platforms and scalable communications software that accelerate AI and HPC workloads.
Responsibilities
- Investigate opportunities to improve communication performance by identifying bottlenecks in today’s systems.
- Design and implement new communication technologies to accelerate AI and HPC workloads.
- Explore innovative hardware and software solutions for next-generation platforms as part of co-design efforts with GPU, networking, and software architects.
- Build proofs-of-concept, conduct experiments, and perform quantitative modeling to evaluate and drive new innovations.
- Use simulation to explore performance of large GPU clusters (scales of hundreds to hundreds of thousands of GPUs).
Requirements
- M.S. or Ph.D. degree in Computer Science, Computer Engineering, or equivalent experience.
- 5+ years of relevant experience.
- Excellent C/C++ programming and debugging skills.
- Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
- Deep understanding of operating systems, computer and system architecture.
- Solid fundamentals of network architecture, topology, algorithms, and communication scaling relevant to AI and HPC workloads.
- Strong experience with Linux.
- Ability and flexibility to work and communicate effectively in a multi-national, multi-time-zone corporate environment.
Ways to stand out
- Expertise in related technology and passion for the domain; experience with CUDA programming and NVIDIA GPUs.
- Knowledge of high-performance networks such as InfiniBand, RoCE, NVLink, and other interconnects.
- Experience with Deep Learning frameworks (PyTorch, TensorFlow) and knowledge of deep learning parallelisms and mapping to the communication subsystem.
- Experience with HPC applications and demonstrated ability to guide and influence in multi-functional teams.
Compensation and benefits
- Base salary range:
- Level 4: 184,000 USD - 287,500 USD (base salary determined by location, experience, and pay of employees in similar positions).
- Level 5: 224,000 USD - 356,500 USD.
- Eligible for equity and benefits (see NVIDIA benefits).
Additional information
- Location: Santa Clara, CA, United States (Full time).
- Applications accepted at least until August 13, 2025.
- NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.