Senior HPC Architect, Networking

at Nvidia
USD 148,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

System Administration @ 4 Linux @ 4 Python @ 4 Machine Learning @ 4 Bash @ 4 Communication @ 7 Mathematics @ 4 Networking @ 4 Debugging @ 4 GPU @ 4

Details

NVIDIA seeks an outstanding hands-on architect/engineer to support deployment and bringup of large-scale GPU compute clusters. You will enable the latest GPU computing and networking products, contribute to AI and GPU computing breakthroughs, and implement at-scale system administration and tuning mechanisms for large-scale compute runs. You will work with accelerated computing and deep learning software and hardware platforms and collaborate with researchers, developers, and customers to craft improved workflows and leading solutions.

Responsibilities

  • Provide engineering solutions to operationalize GPU computing and networking products and software stacks.
  • Maintain technical relationships with internal and external engineering teams.
  • Assist systems and machine learning/deep learning engineers in building solutions based on NVIDIA technology.
  • Act as an internal reference for system administration, at-scale fabric health and performance analysis, and datacenter and large-scale GPU-accelerated system solutions.
  • Architect, develop, and bring up large-scale performance platforms in collaboration with HPC, OS, GPU compute, and systems specialists.

Requirements

  • 5+ years of experience using accelerated computing for datacenter/HPC-based enterprise computing solutions.
  • Solid understanding of accelerated computing scheduling and I/O stacks.
  • Programming/scripting experience in C/C++, Python, and Bash.
  • Experience working with engineering or academic research communities supporting high performance computing or deep learning.
  • Experience with parallel filesystems.
  • Experience deploying and maintaining high-speed networks (e.g., InfiniBand or Ethernet) for compute and storage traffic.
  • Strong teamwork and communication skills (verbal and written).
  • Ability to multitask effectively in a dynamic environment.
  • BS (or equivalent experience) in Engineering, Mathematics, Physics, or Computer Science. MS or PhD desirable.

Ways to stand out

  • Hands-on experience debugging large-scale InfiniBand/Ethernet/NVLink fabrics.
  • Experience with Spectrum-X fabric deployments.
  • Deep learning framework skills.
  • Exposure to deploying telemetry and visualization pipelines.
  • Exposure to container technology and Linux performance tools.

Benefits

  • Base salary ranges by level:
    • Level 3: 148,000 USD - 235,750 USD
    • Level 4: 184,000 USD - 287,500 USD
  • Eligible for equity and a comprehensive benefits package (see NVIDIA benefits).

Additional information

  • Applications accepted at least until July 29, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and non-discrimination.