Senior Software Engineer - Distributed Inference

at Nvidia
📍 World
📍 Canada
📍 United States
USD 184,000-356,500 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 7 Kubernetes @ 4 Python @ 7 R @ 4 GCP @ 7 Distributed Systems @ 4 Hiring @ 4 AWS @ 7 Azure @ 7 Rust @ 4 Microservices @ 4 API @ 4 LLM @ 4 CUDA @ 7 GPU @ 4

Details

NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. The team builds user-facing tools for Dynamo and Inference Server to make the design and deployment of deep learning models easier and accessible to more data scientists. The role is remote-friendly and focused on production-ready, low-latency inference systems on GPU clusters.

Responsibilities

  • Build and maintain distributed model management systems, including Rust-based runtime components, for large-scale AI inference workloads.
  • Implement inference scheduling and deployment solutions on Kubernetes and Slurm, driving advances in scaling, orchestration, and resource management.
  • Collaborate with infrastructure engineers and researchers to develop scalable APIs, services, and end-to-end inference workflows.
  • Create monitoring, benchmarking, automation, and documentation processes to ensure low-latency, robust, and production-ready inference systems on GPU clusters.

Requirements

  • Bachelor’s, Master’s, or PhD in Computer Science, ECE, or related field (or equivalent experience).
  • 6+ years of professional systems software development experience.
  • Strong programming expertise in Rust (C++ and Python are a plus).
  • Deep knowledge of distributed systems, runtime orchestration, and cluster-scale services.
  • Hands-on experience with Kubernetes, container-based microservices, and integration with Slurm.
  • Proven ability to excel in fast-paced R&D environments and collaborate across functions.

Preferred / Ways to stand out

  • Experience with inference-serving frameworks (e.g., Dynamo Inference Server, TensorRT, ONNX Runtime) and deploying/managing LLM inference pipelines at scale.
  • Contributions to large-scale, low-latency distributed systems (open-source preferred) with proven expertise in high-availability infrastructure.
  • Strong background in GPU inference performance tuning, CUDA-based systems, and operating across cloud-native and hybrid environments (AWS, GCP, Azure).

Compensation & Other Details

  • Base salary ranges (determined by location, experience, and internal pay bands):
    • Level 4: 184,000 USD - 287,500 USD
    • Level 5: 224,000 USD - 356,500 USD
  • You will also be eligible for equity and benefits.
  • Applications accepted at least until August 30, 2025.

Company & Culture

NVIDIA has a long history of innovation in GPUs and accelerated computing and is focused on advancing AI and GPU-accelerated deep learning. NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.