Senior HPC Performance Engineer

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Ansible @ 3 Docker @ 3 Kubernetes @ 3 Python @ 6 TensorFlow @ 4 Communication @ 6 Networking @ 4 Parallel Programming @ 6 Debugging @ 4 System Architecture @ 7 PyTorch @ 4 CUDA @ 3 GPU @ 4

Details

NVIDIA's GPU Communications Libraries and Networking team delivers libraries such as NCCL, NVSHMEM and UCX for Deep Learning and HPC. We are seeking a motivated performance engineer to influence the roadmap of our communication libraries and improve communication performance across multi-GPU and multi-node clusters. This role focuses on performance characterization, analysis, tooling, triage and collaboration across hardware and software stacks.

Responsibilities

Conduct in-depth performance characterization and analysis on large multi-GPU and multi-node clusters.
Study interactions of libraries with hardware (GPU, CPU, networking) and software components across the stack.
Evaluate proof-of-concepts and perform trade-off analysis for alternative solutions.
Triage and root-cause performance issues reported by customers.
Collect large volumes of performance data; build tools and infrastructure to visualize and analyze information.
Collaborate with a dynamic, cross-time-zone team.

Requirements

M.S. (or equivalent experience) or Ph.D. in Computer Science or a related field with relevant performance engineering and HPC experience.
3+ years of experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM).
Experience conducting performance benchmarking and triage on large-scale HPC clusters.
Strong understanding of computer system architecture, hardware-software interactions and operating systems principles.
Ability to implement micro-benchmarks in C/C++ and read/modify existing code bases.
Ability to debug performance issues across the entire HW/SW stack.
Proficient in a scripting language, preferably Python.
Familiarity with containers, cloud provisioning and scheduling tools (Kubernetes, SLURM, Ansible, Docker).
Adaptability and willingness to learn new tools and areas; ability to work and communicate effectively across teams and time zones.

Ways to stand out

Practical experience with Infiniband/Ethernet networks (RDMA, topologies, congestion control).
Experience debugging network issues in large-scale deployments.
Familiarity with CUDA programming and/or GPUs.
Experience with deep learning frameworks such as PyTorch or TensorFlow.

Compensation and benefits

Base salary ranges provided by location and level:
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
Eligible for equity and benefits (link to NVIDIA benefits).

Other information

Applications accepted at least until August 12, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.