Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 7 Algorithms @ 4 Data Structures @ 7 Distributed Systems @ 4 Machine Learning @ 4 TensorFlow @ 7 Communication @ 1 Mentoring @ 1 LLM @ 4 PyTorch @ 7 CUDA @ 4 GPU @ 4Details
We are now looking for a Senior Research Engineer passionate about Generative AI inference. NVIDIA is at the forefront of generative AI models, from language to images, providing building blocks to democratize AI and make generative AI easy to develop, integrate, and deploy. The team focuses on developing optimized inferencing technologies to support growing generative AI needs, covering all steps of the machine learning lifecycle including conceptualization, applied research, engineering for optimized inference, and deployment. Collaboration occurs with research teams, engineers, and the open-source community to implement optimized Large Language Model (LLM) algorithms.
Responsibilities
- Develop new models and algorithms focused on Large Language Models, Natural Language Processing, and Deep Learning.
- Design and implement multi-node serving architectures, disaggregated serving, and distributed LLM inference.
- Optimize multi-LoRA and other PEFT technique inference serving systems.
- Apply sophisticated quantization techniques (FP4/INT4, FP8) to reduce model footprint while preserving quality.
- Implement speculative decoding (draft target, eagle, medusa, etc.) and other latency optimization strategies.
- Demonstrate good engineering practices and mentor other team members.
- Collaborate across engineering teams at NVIDIA to ensure seamless integration of software up and down the accelerated serving stack.
Requirements
- Understanding of modern Machine Learning techniques, Deep Neural Networks, Natural Language Processing, or Speech Recognition.
- 8+ years of industry experience in Deep Learning frameworks such as PyTorch or TensorFlow.
- Passion for software engineering with strong skills in C++ and Python development; meaningful contributions to major open-source projects.
- Strong communication and interpersonal skills; ability to work in a dynamic and distributed team; mentoring experience is a plus.
- Bachelor's degree or equivalent experience.
- Desire for continuous growth and learning.
- Strong computer science fundamentals including algorithms and data structures, computational complexity, parallel and distributed computing, and system software.
Ways to Stand Out
- Experience architecting or developing large-scale distributed systems for deep learning.
- Knowledge of CPU and/or GPU architecture.
- GPU programming experience including CUDA.
Compensation
The base salary range is 184,000 USD - 356,500 USD. Equity and benefits are also provided. Base salary depends on location, experience, and pay of employees in similar positions.
NVIDIA is committed to diversity and is an equal opportunity employer, valuing inclusion across all protected characteristics.