Senior Performance Engineer - Deep Learning

at Nvidia
USD 152,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 6 Hiring @ 7 Communication @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 7 AI @ 7 Profiling @ 4 Performance Analysis @ 4

Details

Our Deep Learning models performance engineering team at NVIDIA is hiring software engineers at all experience levels to build and optimize the libraries and tools that enable Deep Learning Researchers and Engineers to design, develop, and deploy efficient AI applications. The team builds optimizations directly into mainstream open source Deep Learning frameworks (PyTorch and JAX) to boost performance across NVIDIA's AI stack, and collaborates with other NVIDIA teams and the open-source community.

Responsibilities

  • Build and support Transformer Engine, the open-source library for accelerating the training of Large Language Models.
  • Collaborate on systems research that improves Deep Learning model performance (e.g., extremely low precision training, parallelism methods).
  • Implement, benchmark, and optimize new Deep Learning models (such as LLMs) to scale efficiently on NVIDIA GPUs and systems.
  • Build and contribute to NVIDIA submissions on community benchmarks such as MLPerf.
  • Engage with the open-source community and support enterprise customers and partners to deliver benefits of NVIDIA hardware and software innovations.
  • Influence the design of new hardware generations and core platform software components for NVIDIA hardware and systems.

Requirements

  • BS or equivalent experience in Computer Science, Electrical Engineering, or a related field.
  • 3+ years of experience in C++ and Python programming.
  • Strong background, experience, or coursework in parallel systems programming, preferably on GPUs.
  • Knowledge of Computer Architecture, Code Optimization, and/or Operating Systems.
  • Proven experience in developing large software projects.
  • Excellent verbal and written communication skills.

Preferred / Ways to Stand Out

  • Experience in PyTorch, JAX, or other deep learning frameworks.
  • Experience with performance analysis, profiling, and code optimization techniques, especially with multi-GPU or multi-node systems.
  • Knowledge of modern LLM architectures, attention mechanisms, and/or low-level DL libraries such as cuBLAS, cuDNN, and cuSOLVER.
  • Experience writing GPU kernels using CUDA, OpenAI Triton, CuTeDSL, Pallas, or similar libraries.
  • Past contributions to open source projects and experience working with multidisciplinary teams.

Compensation & Other Details

  • Base salary ranges (location and level dependent):
    • Level 3: 152,000 USD - 241,500 USD
    • Level 4: 184,000 USD - 287,500 USD
  • You will also be eligible for equity and benefits: https://www.nvidia.com/en-us/benefits/
  • Applications for this job will be accepted at least until March 8, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.