Senior HPC Support Engineer, InfiniBand - NVLink

at Nvidia
USD 108,000-201,200 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Marketing @ 4 System Administration @ 7 Linux @ 4 Python @ 4 R @ 4 AWS @ 4 Bash @ 4 Networking @ 4 Debugging @ 6 Customer Support @ 4 ChatGPT @ 4 GPU @ 4

Details

We are seeking a motivated Senior HPC Technical Support Engineer - AI Infrastructure focusing on InfiniBand, NVLink and AI GPU Cluster technology. You will provide comprehensive solutions for sophisticated installations, maintenance, and operations for a broad scope of networking and GPU cluster products. As a primary point of contact for customers, you will assist with technical questions, debug and resolve issues, and interact regularly with Engineering, Marketing, and Support teams.

Responsibilities

  • Resolve sophisticated customer concerns and technical issues through research, reproduction, and problem solving for customers installing and supporting systems using Linux (multi-distro).
  • Provide support focused on NVIDIA InfiniBand, NVLink, NVIDIA GPU technologies and End-to-End Solutions.
  • Respond to customer product support inquiries via telephone, email, or conference calls.
  • Resolve customer issues during installation, operation, maintenance, and interoperability with other vendors.
  • Participate in cross-functional team meetings and provide feedback to engineering and marketing regarding product requirements, customer experience, and support tools.
  • Develop, refine, and document standard methodologies and support processes for internal teams (Support/R&D).
  • Perform site visits and conference calls with customers.

Requirements

  • 5+ years providing in-depth customer support and debugging for hardware and software products.
  • Exceptional interpersonal skills; ability to own and drive resolution of critical customer issues.
  • Strong Linux OS knowledge including system administration and networking (LFCS/RHCSA level).
  • Networking knowledge: IP, L2 and L3 protocols and routing (CCNP/CompTIA Networking+ level).
  • Experience with containerized solutions (DCA and/or CKA), virtualization (KVM/ESXi), and cloud infrastructure (AWS/OCI).
  • Able to debug networking protocols using tools such as tcpdump and Wireshark or similar packet generation and analysis tools.
  • Bash and Python scripting abilities.
  • Strong organizational skills; able to prioritize and multi-task with limited supervision.
  • Integrate AI tools (Cursor, Gemini, ChatGPT, Copilot, Glean, etc.) into daily workflow.
  • Four-year degree from an accredited university/college, or equivalent experience in Computer Science, or Electrical or Computer Engineering.

Preferred / Ways to stand out

  • NVIDIA certifications related to AI infrastructure, operations, and networking.
  • Deep experience with InfiniBand, RDMA, NVLink and NVIDIA GPU technology.
  • Experience with clustering or HPC data-center technologies including upper-layer protocols (MPI, NCCL).
  • Additional OS experience such as Microsoft Windows, VMware, Unix.
  • Configuration and operational expertise with traditional network switch/router and open platforms.

Compensation & Benefits

  • Base salary ranges (determined by location, experience, and internal pay):
    • Level 3: 108,000 USD - 172,500 USD
    • Level 4: 120,000 USD - 201,250 USD
  • Eligible for equity and benefits. (See company benefits page linked in original posting.)

Other details

  • Location: Seattle, Washington, United States.
  • Employment type: Full time.
  • Applications accepted at least until September 12, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.