Senior Deep Learning Performance Architect - LPU

at Nvidia

📍 World
📍 Canada
📍 United States

USD 152,000-287,500 per year

SENIOR

✅ Remote ✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 4 Algorithms @ 4 Machine Learning @ 7 Mathematics @ 6 LLM @ 4 CUDA @ 3 GPU @ 4 Deep Learning @ 4 AI @ 4 Profiling @ 4 HPC @ 3

Details

We are looking for a Senior Deep Learning Performance Architect to join a team focused on pushing AI inference performance boundaries through hardware-software co-design. This role will develop performance strategies, guide GPU architecture decisions, and lead AI efficiency innovation, with emphasis on modeling LLM performance and optimizing AI inference workloads.

Responsibilities

Design novel GPU and system architectures to advance AI inference performance and efficiency.
Construct, investigate, and test popular deep learning algorithms and applications.
Understand and analyze the relationship between hardware and software architectures and their influence on future algorithms and applications.
Build efficient power and performance models of the AI inference stack, capturing minimal but significant information to guide next-generation hardware architecture.
Collaborate across the company with software, research, and product teams to guide the direction of AI.

Requirements

MS or PhD in a relevant field (Computer Science, Electrical Engineering, Mathematics) or equivalent experience, with 5+ years of relevant experience.
Strong mathematical foundation in machine learning and deep learning.
Expert programming skills in C, C++, and/or Python.
Familiarity with GPU computing (CUDA or similar) and HPC stack (MPI, OpenMP).
Strong knowledge and coursework in computer architecture.

Ways to stand out

Background with systems-level performance modeling, profiling, and analysis.
Experience characterizing and modeling system-level performance, performing comparison studies, and documenting/publishing results.
Experience improving AI inference workloads by developing CUDA kernels or compilers for custom ASIC hardware.

Compensation & Benefits

Base salary ranges (location- and level-dependent):
- Level 3: 152,000 USD - 241,500 USD per year
- Level 4: 184,000 USD - 287,500 USD per year
Eligible for equity and company benefits.

Additional information

Work model: #LI-Hybrid (hybrid); job location lists United States, Canada, and Remote.
Applications accepted at least until March 16, 2026.
NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.