Senior Performance Architect - Heterogeneous Workload Optimization

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences

Kubernetes @ 4 CUDA @ 4 GPU @ 4 AI @ 4 Profiling @ 4 Slurm @ 4 Performance Analysis @ 7

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

As EDA workloads transition from traditional CPU-bound tasks to massively parallel GPU-accelerated engines, the complexity of identifying bottlenecks has scaled exponentially. We are seeking a Senior Systems Performance Engineer to build the next generation of profiling infrastructure. You will be responsible for measuring, analyzing, and optimizing the interaction between extensive design graphs in system memory and high-throughput kernels on the GPU.

Responsibilities

  • Architect and maintain custom profiling frameworks that provide a unified view of execution across CPU (multi-core/multi-socket) and GPU (multi-node/NVLink) environments.
  • Conduct deep-dive benchmarking of EDA applications to characterize memory access patterns, cache hit rates, and instruction-level parallelism.
  • Use GPU profilers to detect GPU-side inefficiencies such as warp divergence, sub-optimal occupancy, and PCIe/NVLink bottlenecks.
  • Develop tools to monitor and attribute high-watermark memory usage in multi-terabyte EDA builds, finding opportunities for data structure compression or smarter memory pooling.
  • Develop predictive models to guide hardware procurement and cloud instance selection based on built gate-count and algorithmic complexity.

Requirements

  • A grasp of the CUDA programming model and experience employing GPU profiling tools like NVIDIA Nsight Systems/Compute to address PCIe bottlenecks and kernel stalls.
  • Extensive knowledge of profiling tools such as perf, eBPF, VTune, or Valgrind, along with insight into their internal mechanisms.
  • A passion for meticulous benchmarking and the ability to distill sophisticated performance data into actionable engineering roadmaps.
  • Experience with distributed compute environments (Slurm, LSF, or Kubernetes).
  • BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field (or equivalent experience) with more than 8+ years of relevant experience and at least 5 years involved in systems-level performance analysis.

Compensation and benefits

  • Base salary ranges provided by location and level: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
  • Eligible for equity and benefits (link to NVIDIA benefits referenced in the posting).

Additional information

  • #LI-Hybrid
  • Applications for this job will be accepted at least until February 16, 2026.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.