Senior Deep Learning Performance Architect - LPU

at Nvidia
📍 World
📍 Canada
📍 United States
USD 152,000-287,500 per year
SENIOR
✅ Remote ✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 4 Algorithms @ 4 Machine Learning @ 7 Mathematics @ 6 LLM @ 4 CUDA @ 3 GPU @ 4 Deep Learning @ 4 AI @ 4 Profiling @ 4 HPC @ 3

Details

We are looking for a Senior Deep Learning Performance Architect to join a team focused on pushing AI inference performance boundaries through hardware-software co-design. This role will develop performance strategies, guide GPU architecture decisions, and lead AI efficiency innovation, with emphasis on modeling LLM performance and optimizing AI inference workloads.

Responsibilities

  • Design novel GPU and system architectures to advance AI inference performance and efficiency.
  • Construct, investigate, and test popular deep learning algorithms and applications.
  • Understand and analyze the relationship between hardware and software architectures and their influence on future algorithms and applications.
  • Build efficient power and performance models of the AI inference stack, capturing minimal but significant information to guide next-generation hardware architecture.
  • Collaborate across the company with software, research, and product teams to guide the direction of AI.

Requirements

  • MS or PhD in a relevant field (Computer Science, Electrical Engineering, Mathematics) or equivalent experience, with 5+ years of relevant experience.
  • Strong mathematical foundation in machine learning and deep learning.
  • Expert programming skills in C, C++, and/or Python.
  • Familiarity with GPU computing (CUDA or similar) and HPC stack (MPI, OpenMP).
  • Strong knowledge and coursework in computer architecture.

Ways to stand out

  • Background with systems-level performance modeling, profiling, and analysis.
  • Experience characterizing and modeling system-level performance, performing comparison studies, and documenting/publishing results.
  • Experience improving AI inference workloads by developing CUDA kernels or compilers for custom ASIC hardware.

Compensation & Benefits

  • Base salary ranges (location- and level-dependent):
    • Level 3: 152,000 USD - 241,500 USD per year
    • Level 4: 184,000 USD - 287,500 USD per year
  • Eligible for equity and company benefits.

Additional information

  • Work model: #LI-Hybrid (hybrid); job location lists United States, Canada, and Remote.
  • Applications accepted at least until March 16, 2026.
  • NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.