Senior Deep Learning Inference Performance Architect

at Nvidia

📍 Durham, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 4 Algorithms @ 4 Machine Learning @ 7 Mathematics @ 6 CUDA @ 4 GPU @ 4

Details

We are now looking for a Senior Deep Learning Inference Performance Architect!

NVIDIA is seeking a Senior Performance Architect - a creative engineer who loves to squeeze out every cycle of performance from deep learning software. The Inference Architecture team does hardware-software co-design work that focuses on accelerating AI Inference workloads. In this role, you will write performance-optimized low-level code on GPUs, evaluate and improve state-of-the-art performance techniques in production Large Language Model deployments, and help guide future GPU architecture decisions. If you enjoy digging deep into GPU architecture details, are passionate about AI, and know where every cycle goes when you write highly tuned software, this role may be a great fit.

Responsibilities

Develop innovative GPU and system architectures to extend the state of the art in AI Inference performance and efficiency.
Model, analyze, and prototype key deep learning algorithms and applications.
Understand and analyze the interplay of hardware and software architectures on future algorithms and applications.
Write efficient software for AI Inference, including CUDA kernels, framework-level code, and application-level code.
Collaborate across the company to guide the direction of AI, working with software, research, and product teams.

Requirements

MS or PhD in a relevant discipline (Computer Science, Electrical Engineering, Mathematics) or equivalent experience, with 5+ years of relevant experience.
Strong mathematical foundation in machine learning and deep learning.
Expert programming skills in C, C++, and Python.
Familiarity with GPU computing (CUDA or similar) and HPC (MPI, OpenMP).
Strong knowledge and coursework in computer architecture.

Ways to stand out (preferred / nice-to-have)

Background with systems-level performance modeling, profiling, and analysis.
Experience characterizing and modeling system-level performance, executing comparison studies, and documenting/publishing results.
Experience optimizing AI Inference workloads with CUDA kernel development.

Compensation & Benefits

Base salary ranges provided by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
You will also be eligible for equity and benefits (see NVIDIA benefits page).

Additional information

Applications accepted at least until November 1, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.