Senior Solutions Architect, Cluster Design And Architecture - Networking

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Distributed Systems @ 4 Communication @ 4 Networking @ 4 Performance Optimization @ 4 Debugging @ 4 GPU @ 4

Details

NVIDIA is building the world’s most groundbreaking and innovative accelerated computing platforms for AI and HPC. As AI workloads scale to unprecedented levels, the network is the backbone that makes large compute clusters possible. In this role you will assist with designs and architectures for next-generation networking solutions that connect thousands of GPUs and enable advanced AI supercomputers and enterprise AI infrastructure.

Responsibilities

  • Act as a key technical expert connecting NVIDIA's new networking technology builds (including Infiniband, Spectrum-X, NVLink, and software solutions).
  • Partner with internal engineering efforts in GPU cluster building and networking and convey architecture and guidelines information directly to customers and field teams.
  • Guide field teams and customers in cluster design, balancing design principles with situational limitations to create performant and supportable GPU clusters.
  • Work closely with field teams to ensure successful first deployments of new products, including new network architectures and topologies.
  • Provide feedback from customers/field on networking development and workflows to engineering teams and contribute to customer-facing documentation and reference material.
  • Perform hands-on assistance to field teams debugging network build, configuration, and performance issues, leveraging internal engineering expertise and known bugs.
  • Work on end-to-end cluster design, network topology and architecture optimization, performance modeling and validation.

Requirements

  • BS, MS, or PhD in Computer Science, Electrical Engineering, Computer Engineering, Physics, or related field (or equivalent experience).
  • 8+ years of experience in network architecture, network design, network validation and troubleshooting.
  • Proven expertise in designing large-scale distributed systems, AI clusters, or HPC infrastructure.
  • Ability to translate complex engineering concepts into customer-ready documentation, diagrams, and reference material.

Ways to stand out

  • Experience leading large-scale AI factory or HPC cluster bring-ups or builds.
  • Hands-on experience with NVIDIA networking products including (but not limited to) Infiniband, Spectrum-X, BlueField, etc.
  • Knowledge of NCCL, MPI, and collective communication patterns in distributed training as they pertain to networking patterns and design.
  • Background in network performance optimization, congestion control, and validation at scale.
  • External customer-facing experience and strong communication skills.

Compensation and benefits

  • Base salary range: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
  • Eligible for equity and benefits (see NVIDIA benefits).

Additional information

  • Applications for this job will be accepted at least until December 6, 2025.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.