Senior Deep Learning Frameworks CUDA Software Engineer

at Nvidia
USD 224,000-356,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 4 Machine Learning @ 4 Communication @ 4 System Architecture @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 Reinforcement Learning @ 4 vLLM @ 4 NCCL @ 4 TensorRT @ 4 SGLang @ 4 HPC @ 4 Performance Analysis @ 4 JAX @ 4

Details

NVIDIA is looking for a motivated Deep Learning engineer to bring advanced CUDA features and distributed runtime technologies into AI stacks (PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc.). You will work with teams that created core CUDA features and runtimes for scaling Deep Learning and HPC applications, addressing multi-GPU demands from training at very large scale to low-latency inference.

Responsibilities

  • Integrate new CUDA features and runtime abstractions into AI frameworks: from proof-of-concept to performance analysis to production.
  • Perform deep analysis of AI workloads and frameworks to identify requirements and opportunities to innovate in the lower layers of the stack; collaborate hands-on with teams working on the latest AI models.
  • Own and drive improvements in the AI compiler-runtime interface to build high-performance multi-GPU, multi-node solutions.
  • Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads.
  • Influence the roadmap of core CUDA to facilitate next-generation deep learning frameworks.
  • Collaborate across multiple time zones with AI researchers, hardware and software architects, kernel and compiler authors, and CUDA driver experts to co-design systems and frameworks that enhance performance and programmability.
  • Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning.
  • Write clean, effective, and maintainable code so prototypes can transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.

Requirements

  • BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
  • 8+ years of relevant industry experience or equivalent academic experience after completed degree.
  • Development experience with deep learning frameworks such as PyTorch and JAX, and inference engines such as TRT-LLM, vLLM, SGLang.
  • Rapid prototyping and development experience with Python, C++, CUDA or related domain-specific languages.
  • Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g., torch.compile).
  • Experience conducting performance benchmarking on AI clusters and familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems).
  • Understanding of HPC/AI communication concepts and communication libraries.
  • Good understanding of computer system architecture, HW–SW interactions and operating systems principles (systems software fundamentals).
  • Adaptability and passion to learn new frameworks and tools, and flexibility to work and communicate effectively across different teams and time zones.

Ways to stand out

  • Deep expertise in performance internals and execution graphs of major deep learning autograd, training and inference frameworks (PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, etc.).
  • Hands-on experience with CUDA, communication libraries (NCCL, MPI, UCX) and distributed machine learning techniques (pipeline parallelism, tensor parallelism).
  • Expertise in areas such as training, distributed inference, Mixture of Experts (MoE), reinforcement learning, or kernel authoring (CUDA, Triton, cuTe).
  • Background in deep learning compilers, both graph-level and codegen (Triton, XLA, torch.compile).
  • Experience programming for compute & communication overlap in distributed runtimes.

Compensation & Benefits

  • Base salary ranges published in the posting:
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • You will also be eligible for equity and benefits (link to NVIDIA benefits referenced in the original posting).
  • Applications accepted at least until May 18, 2026.

Company

  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. The posting includes standard non-discrimination statements.