Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 4 Docker @ 4 Kubernetes @ 4 Linux @ 8 Python @ 6 R @ 4 Communication @ 4 Networking @ 4 Debugging @ 4 Customer Support @ 4 CUDA @ 4Details
The NVIDIA Enterprise Experience (NVEX) Solutions Engineering team is looking for a senior Computer or Software Engineer to become an authority in network technology used in AI clusters. The role focuses on providing the highest level of support for InfiniBand, NVLink, and Spectrum-X network systems that interconnect GPUs and AI compute infrastructure, bridging customer support teams and R&D to resolve tough production issues.
Responsibilities
- Assist network and AI cluster support teams in reproducing, resolving, and root-causing sophisticated customer issues.
- Work with R&D teams to develop bug fixes, workarounds, and solutions for critical customers using NVIDIA's network technologies.
- Become an authority in NVIDIA network technologies used in AI clusters such as InfiniBand, NVLink, and Spectrum-X.
- Analyze network performance metrics and make tuning recommendations for high-performance, lossless networks.
- Develop support and analysis tools to help analyze and root-cause field issues.
- Daily use of AI tools for software development, log and trace analysis, and source code debugging.
- Occasional work on weekends or holidays to support customers.
Requirements
- Minimum of a BS in Computer, Electrical, or Software Engineering (or equivalent experience).
- 5-10 years of experience in C programming in Linux and embedded systems.
- Proficiency in Python.
- At least 5 years of experience developing software for one or more of the following: Linux NIC drivers, switch ASICs and SDKs, embedded network device firmware, Linux-based network equipment (routers, switches, gateways), network operating systems, virtual routers, SDN stacks, virtual switching, DPDK, SR-IOV stacks.
- At least 5 years of experience directly supporting end-customers, partners, or integrators for network equipment and infrastructures.
- Strong system software expertise (firmware, BIOS, kernel, driver, operating system).
- Experience with container environments (Kubernetes and Docker).
- Professional-level communication skills and the ability to adjust communication to the technical level of the audience and remain calm in negative situations.
- Passion for learning innovative tech and motivation to work on ground-breaking products.
Ways to stand out
- Background with AI infrastructure and HPC networking.
- Experience programming switch and NIC ASICs and SDKs.
- Experience with InfiniBand or other non-Ethernet network technologies.
- Experience developing or supporting DPUs or SmartNICs.
- Knowledge of HPC performance test tools and NVIDIA AI stacks (NCCL, MPI, DOCA, CUDA).
Compensation & Benefits
- Base salary range (Level 3): 136,000 USD - 212,750 USD.
- Base salary range (Level 4): 168,000 USD - 264,500 USD.
- You will also be eligible for equity and benefits (see NVIDIA benefits page).
Additional information
- Applications for this job will be accepted at least until August 11, 2025.
- NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.