Senior Performance Engineer - Deep Learning

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 6 Hiring @ 7 Communication @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 7 AI @ 7 Profiling @ 4 Performance Analysis @ 4

Details

Our Deep Learning models performance engineering team at NVIDIA is hiring software engineers at all experience levels to build and optimize the libraries and tools that enable Deep Learning Researchers and Engineers to design, develop, and deploy efficient AI applications. The team builds optimizations directly into mainstream open source Deep Learning frameworks (PyTorch and JAX) to boost performance across NVIDIA's AI stack, and collaborates with other NVIDIA teams and the open-source community.

Responsibilities

Build and support Transformer Engine, the open-source library for accelerating the training of Large Language Models.
Collaborate on systems research that improves Deep Learning model performance (e.g., extremely low precision training, parallelism methods).
Implement, benchmark, and optimize new Deep Learning models (such as LLMs) to scale efficiently on NVIDIA GPUs and systems.
Build and contribute to NVIDIA submissions on community benchmarks such as MLPerf.
Engage with the open-source community and support enterprise customers and partners to deliver benefits of NVIDIA hardware and software innovations.
Influence the design of new hardware generations and core platform software components for NVIDIA hardware and systems.

Requirements

BS or equivalent experience in Computer Science, Electrical Engineering, or a related field.
3+ years of experience in C++ and Python programming.
Strong background, experience, or coursework in parallel systems programming, preferably on GPUs.
Knowledge of Computer Architecture, Code Optimization, and/or Operating Systems.
Proven experience in developing large software projects.
Excellent verbal and written communication skills.

Preferred / Ways to Stand Out

Experience in PyTorch, JAX, or other deep learning frameworks.
Experience with performance analysis, profiling, and code optimization techniques, especially with multi-GPU or multi-node systems.
Knowledge of modern LLM architectures, attention mechanisms, and/or low-level DL libraries such as cuBLAS, cuDNN, and cuSOLVER.
Experience writing GPU kernels using CUDA, OpenAI Triton, CuTeDSL, Pallas, or similar libraries.
Past contributions to open source projects and experience working with multidisciplinary teams.

Compensation & Other Details

Base salary ranges (location and level dependent):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
You will also be eligible for equity and benefits: https://www.nvidia.com/en-us/benefits/
Applications for this job will be accepted at least until March 8, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.