Senior Software Engineer - NIM Factory Container And Cloud Infrastructure
    at Nvidia
  
  
    
      π Santa Clara, United States
    
  
  
    
      
      
        USD 184,000-356,500 per year
      
    
    
  
  
    
  
  
  SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Docker @ 4 Kubernetes @ 4 Python @ 4 CI/CD @ 4 Communication @ 6 Helm @ 4 SRE @ 4 Microservices @ 4 API @ 4 LLM @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is seeking a Senior Software Engineer focused on container and cloud infrastructure to design and implement the core container strategy for NVIDIA Inference Microservices (NIMs) and hosted services. You will build enterprise-grade software and tooling for container build, packaging, and deployment, improve reliability, performance, and scale across thousands of GPUs, and support emerging deployment patterns including disaggregated LLM inference.
Responsibilities
- Design, build, and harden containers for NIM runtimes and inference backends; enable reproducible, multi-arch, CUDA-optimized builds.
 - Develop Python tooling and services for build orchestration, CI/CD integrations, Helm/Operator automation, and test harnesses; enforce quality with typing, linting, and unit/integration tests.
 - Help design and evolve Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts.
 - Optimize container performance: layer layout, startup time, build caching, runtime memory/IO, network, and GPU utilization; instrument with metrics and tracing.
 - Evolve the base image strategy, dependency management, and artifact/registry topology.
 - Collaborate across research, backend, SRE, and product teams to ensure day-0 availability of new models.
 - Mentor teammates and set high engineering standards for container quality, security, and operability.
 
Requirements
- 10+ years building production software with a strong focus on containers and Kubernetes.
 - Strong Python skills building production-grade tooling and services.
 - Experience with Python SDKs and clients for Kubernetes and cloud services.
 - Expert knowledge of Docker/BuildKit, containerd/OCI, image layering, multi-stage builds, and registry workflows.
 - Deep experience operating workloads on Kubernetes.
 - Strong understanding of LLM inference features, including structured output, KV-cache, and LoRa adapter.
 - Hands-on experience building and running GPU workloads in Kubernetes, including NVIDIA device plugin, MIG, CUDA drivers/runtime, and resource isolation.
 - Excellent collaboration and communication skills; ability to influence cross-functional design.
 - A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.
 
Ways to stand out
- Expertise with Helm chart design systems, Operators, and platform APIs serving many teams.
 - Experience with OpenAI API and Hugging Face API and understanding differences between inference backends (vLLM, SGLang, TRT-LLM).
 - Background in benchmarking and optimizing inference container performance and startup latency at scale.
 - Prior experience designing multi-tenant, multi-cluster, or edge/air-gapped container delivery.
 - Contributions to open-source container, Kubernetes, or GPU ecosystems.
 
Benefits
With competitive salaries and a generous benefits package, NVIDIA offers equity and benefits. Applications for this job will be accepted at least until September 22, 2025.
Compensation
Base salary range (location- and level-dependent):
- Level 4: 184,000 USD - 287,500 USD
 - Level 5: 224,000 USD - 356,500 USD