Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 7 Algorithms @ 4 Data Structures @ 7 Distributed Systems @ 4 Machine Learning @ 4 TensorFlow @ 7 Communication @ 7 Mentoring @ 4 NLP @ 4 LLM @ 4 PyTorch @ 7 CUDA @ 4 GPU @ 4Details
We are looking for a Senior Research Engineer passionate about Generative AI inference to change the way people infuse AI into products and services. The team focuses on optimized inferencing technologies for generative AI models (language, images), contributing across the ML lifecycle: conceptualization, applied research, engineering for optimized inference, and deployment. The role collaborates with research teams, engineers, and the open-source community and implements optimized LLM algorithms.
Responsibilities
- Develop new models and algorithms focused on Large Language Models (LLMs), Natural Language Processing (NLP), and Deep Learning.
- Design and implement multi-node serving architectures, disaggregated serving, and distributed LLM inference.
- Optimize multi-LoRA (and other PEFT techniques) inference serving systems.
- Apply sophisticated quantization techniques (FP4/INT4, FP8) to reduce model footprint while preserving quality.
- Implement speculative decoding (e.g., draft target, eagle, medusa) and other latency optimization strategies.
- Demonstrate strong engineering practices and mentor other team members.
- Collaborate across NVIDIA engineering teams to ensure software integrates with the NVIDIA accelerated serving stack.
Requirements
- Understanding of modern techniques in Machine Learning, Deep Neural Networks, Natural Language Processing, or Speech Recognition.
- 8+ years industry experience in Deep Learning frameworks (PyTorch or TensorFlow).
- Strong software engineering skills with excellent C++ and Python development experience and meaningful contributions to major open-source projects.
- Strong communication and interpersonal skills; ability to work in dynamic and distributed teams. History of mentoring junior engineers and interns is a plus.
- Bachelor’s degree or equivalent experience.
- Desire to constantly grow and learn new things.
- Strong computer science fundamentals: algorithms and data structures, computational complexity, parallel and distributed computing, and system software.
Ways to stand out
- Experience architecting or developing large-scale distributed systems for deep learning.
- Knowledge of CPU and/or GPU architecture.
- GPU programming (CUDA).
Compensation & Benefits
- Base salary ranges provided by level:
- Level 4: 184,000 USD - 299,000 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and company benefits (link to benefits provided).
Other details
- Location format in posting: US, WA, Remote — position is offered remotely (see location field).
- Applications accepted at least until September 30, 2025.
- NVIDIA is an equal opportunity employer and values diversity across all characteristics protected by law.