Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 3
Communication @ 7
Data Analysis @ 4
Debugging @ 4
LLM @ 4
PyTorch @ 3
AI @ 7
vLLM @ 4
SGLang @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is seeking a Senior Software Engineer to accelerate discovery and deployment of efficient quantized and sparse inference recipes for large language models (LLMs). Recipes define which operators are transformed into low-precision or sparsified variants to unlock throughput and latency gains without regressing accuracy or verbosity. Work covers kernel and model-level implementations across inference engines and collaboration with partner inference teams to optimize throughput and interactivity on target workloads.
Responsibilities
- Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang).
- Translate recipe specifications into functionally correct, performant code (e.g., write Triton kernels, insert quantize/dequantize nodes into prefill and decode paths).
- Ensure per-expert scaling in MoE layers is handled correctly.
- Own model export pipelines (ModelOpt, Megatron-LM <-> HuggingFace) to ensure quantized checkpoints serialize correctly for downstream serving.
- Build prototypes and benchmarking harnesses to evaluate recipe throughput and interactivity before full optimization.
- Develop data analysis tooling and visualizations for numerics debugging.
- Improve developer productivity across the team (CI, build systems, training infrastructure, pipeline friction).
- Participate in code reviews and incorporate feedback.
Requirements
- Proficient in Python; familiarity with C++.
- Strong software engineering fundamentals: concise, well-tested code; fluent with AI-assisted tooling.
- Experience with ML accelerators and a basic understanding of how certain ML layers affect execution time.
- Familiarity with PyTorch internals (custom ops, autograd, export) or equivalent framework internals.
- Experience reading, modifying, or contributing to a large open-source codebase.
- MS/PhD in Computer Science or related field, or equivalent experience.
- 4+ years in a relevant software engineering role.
- Demonstrated ability to move fast with ambiguous requirements, with strong written and verbal communication.
Ways to stand out
- Experience contributing to inference serving frameworks (vLLM, TRT-LLM, SGLang) or Triton kernel development.
- Track record of debugging numerical issues across mixed-precision boundaries.
- Deep experience with model compression techniques: PTQ, QAT, structured/unstructured sparsity.
Compensation & Benefits
- Base salary ranges provided: 152,000 USD - 241,500 USD for Level 3; 184,000 USD - 287,500 USD for Level 4.
- Eligible for equity and company benefits (link to NVIDIA benefits referenced in original posting).
Other information
- Applications accepted at least until March 1, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer committed to diversity.