Senior GPU Kernel Performance Lead

at Nvidia
USD 224,000-425,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 4 Debugging @ 7 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior GPU Kernel Performance Lead to analyze, validate, and report on GPU kernel performance for high-performance GPU math kernels used to accelerate deep learning models. The team delivers libraries such as cuDNN, cuBLAS, and TensorRT and works on Tensor Cores and CUDA-based solutions. While there may be opportunities for hands-on development, this role primarily leads a team focused on validating and improving kernel performance.

Responsibilities

  • Specify test cases derived from deep learning workloads to provide directed and use-case coverage across kernels on both simulation and silicon targets.
  • Determine performance theory through the development and use of analytical models.
  • Track and report on kernel performance throughout the development lifecycle by using and expanding current infrastructure.
  • Provide feedback to kernel developers by identifying performance regressions and opportunities to reach achievable peak performance.

Requirements

  • PhD in Computer Science, Computer Engineering, Applied Math, or a related field (or equivalent experience) with 8+ years of relevant industry experience.
  • Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design.
  • Experience leading or managing a team related to the performance of CPUs, GPUs, or other deep learning accelerators.
  • Experience with performance analysis and tracking across development lifecycles and both simulation and silicon targets.

Ways to stand out

  • Experience with analytical models and cycle-accurate hardware simulators.
  • Knowledgeable about performance tools such as Nsight or Intel VTune.
  • Programming experience beyond C++ including assembly, MLIR/LLVM, Python, and CUDA/OpenCL.
  • Familiarity with CUTLASS (open-source), Tensor Cores, and deep learning workloads.

Compensation & Benefits

  • Base salary range:
    • Level 5: 224,000 USD - 356,500 USD
    • Level 6: 272,000 USD - 425,500 USD
  • You will also be eligible for equity and benefits.

Additional information

  • Applications for this job will be accepted at least until July 29, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment. The company does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.