Senior Software Engineer, Quantized Inference

at Nvidia
USD 152,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Python @ 3 Communication @ 7 Data Analysis @ 4 Debugging @ 4 LLM @ 4 PyTorch @ 3 AI @ 7 vLLM @ 4 SGLang @ 4

Details

NVIDIA is seeking a Senior Software Engineer to accelerate discovery and deployment of efficient quantized and sparse inference recipes for large language models (LLMs). Recipes define which operators are transformed into low-precision or sparsified variants to unlock throughput and latency gains without regressing accuracy or verbosity. Work covers kernel and model-level implementations across inference engines and collaboration with partner inference teams to optimize throughput and interactivity on target workloads.

Responsibilities

  • Implement quantized and sparse recipes in inference engines (vLLM, TRT-LLM, SGLang).
  • Translate recipe specifications into functionally correct, performant code (e.g., write Triton kernels, insert quantize/dequantize nodes into prefill and decode paths).
  • Ensure per-expert scaling in MoE layers is handled correctly.
  • Own model export pipelines (ModelOpt, Megatron-LM <-> HuggingFace) to ensure quantized checkpoints serialize correctly for downstream serving.
  • Build prototypes and benchmarking harnesses to evaluate recipe throughput and interactivity before full optimization.
  • Develop data analysis tooling and visualizations for numerics debugging.
  • Improve developer productivity across the team (CI, build systems, training infrastructure, pipeline friction).
  • Participate in code reviews and incorporate feedback.

Requirements

  • Proficient in Python; familiarity with C++.
  • Strong software engineering fundamentals: concise, well-tested code; fluent with AI-assisted tooling.
  • Experience with ML accelerators and a basic understanding of how certain ML layers affect execution time.
  • Familiarity with PyTorch internals (custom ops, autograd, export) or equivalent framework internals.
  • Experience reading, modifying, or contributing to a large open-source codebase.
  • MS/PhD in Computer Science or related field, or equivalent experience.
  • 4+ years in a relevant software engineering role.
  • Demonstrated ability to move fast with ambiguous requirements, with strong written and verbal communication.

Ways to stand out

  • Experience contributing to inference serving frameworks (vLLM, TRT-LLM, SGLang) or Triton kernel development.
  • Track record of debugging numerical issues across mixed-precision boundaries.
  • Deep experience with model compression techniques: PTQ, QAT, structured/unstructured sparsity.

Compensation & Benefits

  • Base salary ranges provided: 152,000 USD - 241,500 USD for Level 3; 184,000 USD - 287,500 USD for Level 4.
  • Eligible for equity and company benefits (link to NVIDIA benefits referenced in original posting).

Other information

  • Applications accepted at least until March 1, 2026.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer committed to diversity.