Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
Algorithms @ 4
Machine Learning @ 7
Mathematics @ 6
LLM @ 4
CUDA @ 3
GPU @ 4
Deep Learning @ 4
AI @ 4
Profiling @ 4
HPC @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are looking for a Senior Deep Learning Performance Architect to join a team focused on pushing AI inference performance boundaries through hardware-software co-design. This role will develop performance strategies, guide GPU architecture decisions, and lead AI efficiency innovation, with emphasis on modeling LLM performance and optimizing AI inference workloads.
Responsibilities
- Design novel GPU and system architectures to advance AI inference performance and efficiency.
- Construct, investigate, and test popular deep learning algorithms and applications.
- Understand and analyze the relationship between hardware and software architectures and their influence on future algorithms and applications.
- Build efficient power and performance models of the AI inference stack, capturing minimal but significant information to guide next-generation hardware architecture.
- Collaborate across the company with software, research, and product teams to guide the direction of AI.
Requirements
- MS or PhD in a relevant field (Computer Science, Electrical Engineering, Mathematics) or equivalent experience, with 5+ years of relevant experience.
- Strong mathematical foundation in machine learning and deep learning.
- Expert programming skills in C, C++, and/or Python.
- Familiarity with GPU computing (CUDA or similar) and HPC stack (MPI, OpenMP).
- Strong knowledge and coursework in computer architecture.
Ways to stand out
- Background with systems-level performance modeling, profiling, and analysis.
- Experience characterizing and modeling system-level performance, performing comparison studies, and documenting/publishing results.
- Experience improving AI inference workloads by developing CUDA kernels or compilers for custom ASIC hardware.
Compensation & Benefits
- Base salary ranges (location- and level-dependent):
- Level 3: 152,000 USD - 241,500 USD per year
- Level 4: 184,000 USD - 287,500 USD per year
- Eligible for equity and company benefits.
Additional information
- Work model: #LI-Hybrid (hybrid); job location lists United States, Canada, and Remote.
- Applications accepted at least until March 16, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.