Senior DL Algorithms Engineer - Inference Performance

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Algorithms @ 4 Performance Optimization @ 4 LLM @ 4 PyTorch @ 6 CUDA @ 1 GPU @ 4 Deep Learning @ 4 AI @ 4 Profiling @ 4 vLLM @ 4 OpenCL @ 1 SGLang @ 4 HPC @ 6 Performance Analysis @ 4

Details

We are looking for a Senior DL Algorithms Engineer focused on LLM/Omni model inference optimizations. This role is for engineers who perform performance analysis and optimization across the full hardware/software stack — from GPU architecture to deep learning frameworks — to maximize inference performance. You will directly impact hardware and software roadmaps at a fast-growing AI company.

Responsibilities

Enable and optimize state-of-the-art open models (examples: Nemotron and Cosmos) on NVIDIA's accelerated inference software stack.
Contribute new features, fix bugs, and deliver production code to open-source frameworks such as TRT-LLM, vLLM, SGLang, FlashInfer, etc.
Profile and analyze bottlenecks across the full inference stack to push inference performance boundaries.
Benchmark state-of-the-art offerings and perform competitive analysis for NVIDIA's software/hardware stack.
Co-design with partner teams to develop the next generation of AI models and services.

Requirements

PhD in Computer Science, Electrical Engineering, CSEE, or equivalent experience.
3+ years of experience.
Strong background in deep learning and neural networks, particularly inference.
Experience with performance profiling, analysis, and optimization, especially for GPU-based applications.
Proficient in PyTorch or equivalent frameworks for AI, or experience in HPC-heavy application development.
Deep understanding of computer architecture and familiarity with GPU architecture fundamentals.

Ways to Stand Out

Proven experience with processor and system-level performance optimization.
Deep understanding of modern LLM and diffusion model architectures.
Strong fundamentals in algorithms.
GPU programming experience (CUDA or OpenCL) is a strong plus.

Compensation & Benefits

Base salary ranges:
- Level 3: 152,000 USD – 241,500 USD
- Level 4: 184,000 USD – 287,500 USD
Eligible for equity and benefits (see company benefits page).

Other Information

Applications accepted until May 9, 2026. This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.

Equal Opportunity

NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. The company does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.