Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 7 Kubernetes @ 4 Python @ 7 R @ 4 GCP @ 7 Distributed Systems @ 4 Hiring @ 4 AWS @ 7 Azure @ 7 Rust @ 4 Microservices @ 4 API @ 4 LLM @ 4 CUDA @ 7 GPU @ 4Details
NVIDIA is hiring software engineers for its GPU-accelerated deep learning software team. The team builds user-facing tools for Dynamo and Inference Server to make the design and deployment of deep learning models easier and accessible to more data scientists. The role is remote-friendly and focused on production-ready, low-latency inference systems on GPU clusters.
Responsibilities
- Build and maintain distributed model management systems, including Rust-based runtime components, for large-scale AI inference workloads.
- Implement inference scheduling and deployment solutions on Kubernetes and Slurm, driving advances in scaling, orchestration, and resource management.
- Collaborate with infrastructure engineers and researchers to develop scalable APIs, services, and end-to-end inference workflows.
- Create monitoring, benchmarking, automation, and documentation processes to ensure low-latency, robust, and production-ready inference systems on GPU clusters.
Requirements
- Bachelor’s, Master’s, or PhD in Computer Science, ECE, or related field (or equivalent experience).
- 6+ years of professional systems software development experience.
- Strong programming expertise in Rust (C++ and Python are a plus).
- Deep knowledge of distributed systems, runtime orchestration, and cluster-scale services.
- Hands-on experience with Kubernetes, container-based microservices, and integration with Slurm.
- Proven ability to excel in fast-paced R&D environments and collaborate across functions.
Preferred / Ways to stand out
- Experience with inference-serving frameworks (e.g., Dynamo Inference Server, TensorRT, ONNX Runtime) and deploying/managing LLM inference pipelines at scale.
- Contributions to large-scale, low-latency distributed systems (open-source preferred) with proven expertise in high-availability infrastructure.
- Strong background in GPU inference performance tuning, CUDA-based systems, and operating across cloud-native and hybrid environments (AWS, GCP, Azure).
Compensation & Other Details
- Base salary ranges (determined by location, experience, and internal pay bands):
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- You will also be eligible for equity and benefits.
- Applications accepted at least until August 30, 2025.
Company & Culture
NVIDIA has a long history of innovation in GPUs and accelerated computing and is focused on advancing AI and GPU-accelerated deep learning. NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.