Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Algorithms @ 4
Performance Optimization @ 4
LLM @ 4
PyTorch @ 6
CUDA @ 1
GPU @ 4
Deep Learning @ 4
AI @ 4
Profiling @ 4
vLLM @ 4
OpenCL @ 1
SGLang @ 4
HPC @ 6
Performance Analysis @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are looking for a Senior DL Algorithms Engineer focused on LLM/Omni model inference optimizations. This role is for engineers who perform performance analysis and optimization across the full hardware/software stack — from GPU architecture to deep learning frameworks — to maximize inference performance. You will directly impact hardware and software roadmaps at a fast-growing AI company.
Responsibilities
- Enable and optimize state-of-the-art open models (examples: Nemotron and Cosmos) on NVIDIA's accelerated inference software stack.
- Contribute new features, fix bugs, and deliver production code to open-source frameworks such as TRT-LLM, vLLM, SGLang, FlashInfer, etc.
- Profile and analyze bottlenecks across the full inference stack to push inference performance boundaries.
- Benchmark state-of-the-art offerings and perform competitive analysis for NVIDIA's software/hardware stack.
- Co-design with partner teams to develop the next generation of AI models and services.
Requirements
- PhD in Computer Science, Electrical Engineering, CSEE, or equivalent experience.
- 3+ years of experience.
- Strong background in deep learning and neural networks, particularly inference.
- Experience with performance profiling, analysis, and optimization, especially for GPU-based applications.
- Proficient in PyTorch or equivalent frameworks for AI, or experience in HPC-heavy application development.
- Deep understanding of computer architecture and familiarity with GPU architecture fundamentals.
Ways to Stand Out
- Proven experience with processor and system-level performance optimization.
- Deep understanding of modern LLM and diffusion model architectures.
- Strong fundamentals in algorithms.
- GPU programming experience (CUDA or OpenCL) is a strong plus.
Compensation & Benefits
- Base salary ranges:
- Level 3: 152,000 USD – 241,500 USD
- Level 4: 184,000 USD – 287,500 USD
- Eligible for equity and benefits (see company benefits page).
Other Information
- Applications accepted until May 9, 2026. This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
Equal Opportunity
NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. The company does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.