Principal Software Engineer - DGX Cloud Kubernetes Runtime Team

at Nvidia
πŸ“ United States
USD 272,000-425,500 per year
SENIOR
βœ… Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 3 Go @ 7 Kubernetes @ 4 Distributed Systems @ 7 Helm @ 4 API @ 4 GPU @ 4

Details

Join NVIDIA's DGX Cloud Kubernetes Runtime team and be at the forefront of building the next generation of GPU-accelerated Kubernetes runtime distributions. You will design and build automation systems that enable operators to seamlessly install, upgrade, and manage cluster runtime packages powering NVIDIA's AI Accelerators. The team provides a Kubernetes runtime distribution that can be applied to any cluster using NVIDIA accelerators, empowering operators with automation-first, self-service tools that minimize manual effort while enhancing reliability and reproducibility.

Location: US (Washington); Remote (US). Employment type: Full time.

Responsibilities

  • Design and implement the runtime controller system that manages the lifecycle of runtime packages across thousands of Kubernetes clusters without manual pipeline intervention.
  • Build and maintain the runtime builder that packages, validates, and distributes GPU operators, DRA drivers, network components, and other accelerated compute runtime packages.
  • Develop Kubernetes controllers, CustomResourceDefinitions (CRDs), and operators that automate runtime installation, upgrade, and rollback operations with API-driven workflows.
  • Create expansion rules and component management systems that enable flexible runtime composition across different cloud providers and GPU architectures.
  • Work with internal teams to migrate from GitLab pipeline-based deployments to fully automated, controller-powered runtime management.

Requirements

  • Experience building production Kubernetes systems with deep expertise in controllers, operators, and CustomResourceDefinitions.
  • Strong proficiency in Go and experience building scalable Go services that manage complex distributed systems.
  • Hands-on experience with Helm, Kustomize, and managing Kubernetes manifest packaging and templating at scale.
  • Deep understanding of Kubernetes architecture including API machinery, admission controllers, and resource lifecycle management.
  • Demonstrated ability to design and implement automation systems that replace manual processes with reliable, self-service tooling.
  • Masters and/or PhD in Computer Science, or equivalent experience.
  • 15+ years of professional experience, with at least 4 years experience with Kubernetes development.

Preferred / Ways to stand out

  • Experience building multi-tenant platform services with focus on API design, versioning, and backward compatibility.
  • Familiarity with OCI registries, artifact signing, SBOM generation, and supply chain security practices.
  • Experience working with GPU operators, device plugins, or other hardware acceleration components in Kubernetes.
  • Track record of migrating legacy systems to modern, automated platforms while maintaining zero-downtime operations.
  • Contributions to upstream Kubernetes projects or experience extending Kubernetes API machinery.

Compensation & Benefits

  • Base salary range: 272,000 USD - 425,500 USD (will be determined based on location, experience, and pay of employees in similar positions).
  • Eligible for equity and company benefits (see NVIDIA benefits).
  • Applications accepted at least until November 10, 2025.

Company & Diversity

NVIDIA is a leader in AI, High-Performance Computing and Visualization. NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. The company does not discriminate on the basis of any characteristic protected by law.