Senior Research Engineer - Enterprise Products

at Nvidia

📍 United States

USD 184,000-356,500 per year

SENIOR

✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 7 Algorithms @ 4 Data Structures @ 7 Distributed Systems @ 4 Machine Learning @ 4 TensorFlow @ 7 Communication @ 1 Mentoring @ 1 NLP @ 4 LLM @ 4 PyTorch @ 7 CUDA @ 4 GPU @ 4

Details

We are looking for a Senior Research Engineer passionate about Generative AI inference. The team develops optimized inferencing technologies for generative AI models (language, images), contributing across the ML lifecycle from conceptualization and applied research to engineering for optimized inference and deployment. Collaboration with research teams, engineers, and the open-source community is expected, and the role involves implementing optimized LLM algorithms and inference systems.

Responsibilities

Develop new models and algorithms focused on Large Language Models (LLMs), Natural Language Processing (NLP), and Deep Learning.
Design and implement multi-node serving architectures, including disaggregated serving and distributed LLM inference.
Optimize inference serving systems for multi-LoRA and other PEFT techniques.
Apply advanced quantization techniques (FP4/INT4, FP8) to reduce model footprint while preserving quality.
Implement speculative decoding (draft target, eagle, medusa, etc.) and other latency optimization strategies.
Demonstrate good engineering practices and mentor other team members.
Collaborate with engineering teams across NVIDIA to ensure seamless integration with the accelerated serving stack.

Requirements

Understanding of modern techniques in Machine Learning, Deep Neural Networks, Natural Language Processing, or Speech Recognition.
8+ years of industry experience working with Deep Learning frameworks (PyTorch or TensorFlow).
Strong software engineering skills with excellent C++ and Python development experience; meaningful contributions to major open-source projects are desirable.
Strong computer science fundamentals: algorithms and data structures, computational complexity, parallel and distributed computing, system software.
Strong communication and interpersonal skills; experience mentoring junior engineers and interns is a plus.
Bachelor’s degree or equivalent experience.
Curiosity and a desire to continuously learn and grow.

Ways to stand out

Experience architecting or developing large-scale distributed systems for deep learning.
Knowledge of CPU and/or GPU architecture.
GPU programming experience (CUDA).

Compensation & Benefits

Base salary ranges by level:
- Level 4: 184,000 USD - 299,000 USD per year
- Level 5: 224,000 USD - 356,500 USD per year
Eligible for equity and company benefits.

Additional information

Location: US (WA) — Remote option included.
Time type: Full time.
Applications accepted at least until July 29, 2025.
NVIDIA is an equal opportunity employer and values diversity in its workforce.