Senior HPC Support Engineer, InfiniBand - NVLink
at Nvidia
π Seattle, United States
USD 108,000-201,200 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Marketing @ 4 System Administration @ 7 Linux @ 4 Python @ 4 R @ 4 AWS @ 4 Bash @ 4 Networking @ 4 Debugging @ 6 Customer Support @ 4 ChatGPT @ 4 GPU @ 4Details
We are seeking a motivated Senior HPC Technical Support Engineer - AI Infrastructure focusing on InfiniBand, NVLink and AI GPU Cluster technology. You will provide comprehensive solutions for sophisticated installations, maintenance, and operations for a broad scope of networking and GPU cluster products. As a primary point of contact for customers, you will assist with technical questions, debug and resolve issues, and interact regularly with Engineering, Marketing, and Support teams.
Responsibilities
- Resolve sophisticated customer concerns and technical issues through research, reproduction, and problem solving for customers installing and supporting systems using Linux (multi-distro).
- Provide support focused on NVIDIA InfiniBand, NVLink, NVIDIA GPU technologies and End-to-End Solutions.
- Respond to customer product support inquiries via telephone, email, or conference calls.
- Resolve customer issues during installation, operation, maintenance, and interoperability with other vendors.
- Participate in cross-functional team meetings and provide feedback to engineering and marketing regarding product requirements, customer experience, and support tools.
- Develop, refine, and document standard methodologies and support processes for internal teams (Support/R&D).
- Perform site visits and conference calls with customers.
Requirements
- 5+ years providing in-depth customer support and debugging for hardware and software products.
- Exceptional interpersonal skills; ability to own and drive resolution of critical customer issues.
- Strong Linux OS knowledge including system administration and networking (LFCS/RHCSA level).
- Networking knowledge: IP, L2 and L3 protocols and routing (CCNP/CompTIA Networking+ level).
- Experience with containerized solutions (DCA and/or CKA), virtualization (KVM/ESXi), and cloud infrastructure (AWS/OCI).
- Able to debug networking protocols using tools such as tcpdump and Wireshark or similar packet generation and analysis tools.
- Bash and Python scripting abilities.
- Strong organizational skills; able to prioritize and multi-task with limited supervision.
- Integrate AI tools (Cursor, Gemini, ChatGPT, Copilot, Glean, etc.) into daily workflow.
- Four-year degree from an accredited university/college, or equivalent experience in Computer Science, or Electrical or Computer Engineering.
Preferred / Ways to stand out
- NVIDIA certifications related to AI infrastructure, operations, and networking.
- Deep experience with InfiniBand, RDMA, NVLink and NVIDIA GPU technology.
- Experience with clustering or HPC data-center technologies including upper-layer protocols (MPI, NCCL).
- Additional OS experience such as Microsoft Windows, VMware, Unix.
- Configuration and operational expertise with traditional network switch/router and open platforms.
Compensation & Benefits
- Base salary ranges (determined by location, experience, and internal pay):
- Level 3: 108,000 USD - 172,500 USD
- Level 4: 120,000 USD - 201,250 USD
- Eligible for equity and benefits. (See company benefits page linked in original posting.)
Other details
- Location: Seattle, Washington, United States.
- Employment type: Full time.
- Applications accepted at least until September 12, 2025.
- NVIDIA is an equal opportunity employer committed to diversity and inclusion.