Senior HPC Support Engineer, InfiniBand - NVLink
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Marketing @ 4 System Administration @ 4 Linux @ 4 Python @ 4 R @ 4 AWS @ 4 Bash @ 4 Networking @ 4 Debugging @ 4 Customer Support @ 6 ChatGPT @ 4 GPU @ 4Details
We are seeking a motivated Senior HPC Technical Support Engineer - AI Infrastructure focusing on InfiniBand, NVLink and AI GPU Cluster technology, passionate about data center and networking technologies, to provide comprehensive solutions for sophisticated installations, maintenance, or operations for a broad scope of groundbreaking networking products. As a primary point of contact for our customers, you will assist them with technical questions, debugging and resolving their issues. As a member of our Technical Support team, you are a conscientious, proficient communicator who takes ownership in resolving issues while ensuring a high level of customer satisfaction. The role also involves regular interaction with Engineering, Marketing, and Support teams on technical matters.
Responsibilities
- Resolve sophisticated customer concerns and technical issues through research, reproduction, and problem solving for customers installing and supporting systems using Linux (multi-distro), with focus on NVIDIA InfiniBand, NVLink, GPU technology and end-to-end solutions.
- Respond to customer product support inquiries via telephone, email, or conference calls.
- Resolve customer issues during installation, operation, maintenance, or with product application/interoperability with other vendors.
- Participate in cross-functional team meetings and provide feedback to engineering and marketing regarding product requirements, customer experience, and support tools.
- Act as a technical resource: develop, refine, and document standard methodologies and share them with internal teams (Support/R&D) to improve support processes.
- Conduct site visits and conference calls with customers.
Requirements
- 5+ years providing in-depth customer support and debugging for hardware and software products.
- Exceptional interpersonal skills and ownership of issue resolution for critical customer problems.
- Linux OS experience including system administration and networking (LFCS / RHCSA level).
- Networking technologies, protocols, and routing, including IP, L2 and L3 (CCNP / CompTIA Networking+ / Cloud+ level).
- Containerized solutions experience (DCA and/or CKA level), virtualization (KVM / ESXi), and cloud infrastructure (AWS / OCI).
- Ability to debug networking protocols using tools such as tcpdump and Wireshark or similar packet-generation and analysis tools.
- Bash and Python scripting abilities.
- Strong organizational skills; able to prioritize and multi-task with limited supervision.
- Integrating AI tools (Cursor, Gemini, ChatGPT, Copilot, Glean, etc.) into daily workflow.
- Four-year degree in Computer Science, or Electrical or Computer Engineering, or equivalent experience.
Ways to stand out
- NVIDIA certifications related to AI infrastructure, operations and networking.
- Deep knowledge of InfiniBand, RDMA, NVLink and NVIDIA GPU technology.
- Experience with clustering or HPC data-center technologies including upper-layer protocols (MPI, NCCL).
- Additional OS experience such as Microsoft Windows, VMware, Unix.
- Configuration and operational expertise with traditional network switch/router and open platforms.
Compensation & Benefits
- Base salary range (determined by location, experience, and comparator pay):
- Level 3: 108,000 USD - 172,500 USD
- Level 4: 120,000 USD - 201,250 USD
- Eligible for equity and benefits (see NVIDIA benefits page: https://www.nvidia.com/en-us/benefits/).
Other details
- Applications for this job will be accepted at least until September 12, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment and does not discriminate on the basis of legally protected characteristics.