Principal Software Engineer - DGX Cloud

at Nvidia
USD 272,000-431,200 per year
SENIOR
✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Security @ 4 Docker @ 7 Go @ 6 Grafana @ 4 Kubernetes @ 4 Prometheus @ 4 Python @ 6 GCP @ 7 Java @ 6 Distributed Systems @ 4 Leadership @ 4 AWS @ 7 Azure @ 7 API @ 4 Technical Leadership @ 4 OpenTelemetry @ 4 CUDA @ 3 Cloud Computing @ 4 GPU @ 4 Observability @ 4 AI @ 4 Data Pipelines @ 4 Slurm @ 4

Details

NVIDIA is seeking a Principal Software Engineer to join the DGX Cloud team to build foundational systems that drive NVIDIA's high-performance GPU infrastructure. You will craft scalable automation solutions, integrate diverse systems, and enable seamless workflows across global cloud operations. As a Principal Engineer in DGX Cloud you will provide technical leadership for the platform that supports AI and cloud computing workloads.

Responsibilities

  • Lead the build and development of next-generation APIs, state management, and workflow orchestration systems that automate fleet lifecycle operations at massive scale.
  • Drive technical alignment across dependent systems and partner teams to ensure cohesive integration, clear interfaces, and reliable end-to-end workflows, with a strong focus on delivery.
  • Coach, mentor, and encourage senior engineers; elevate technical standards and guidelines across the organization.
  • Maintain strong focus on customer experience and product requirements, translating technical insight into high-impact business solutions.
  • Partner with executive and engineering leadership to codify business processes into self-measuring, scalable, and operationally consistent platforms to reduce manual toil.
  • Direct the integration strategy for key technologies, including common AI schedulers (e.g., Kubernetes, Slurm) and observability systems (e.g., Prometheus, OpenTelemetry, Grafana).

Requirements

  • 16+ years of progressive industry experience.
  • Master's or Bachelor's degree, or equivalent experience defining and shipping complex distributed systems.
  • Deep, hands-on expertise in establishing, operating, and scaling services in fast-paced, high-reliability environments.
  • Ability to thrive in ambiguous, fast-paced environments by rapidly testing ideas, iterating toward working solutions, and hardening winners into reliable, scalable systems.
  • Outstanding proficiency in modern systems programming languages such as Go, Java, or Python.
  • Proven track record of defining, owning, and evolving the architecture of high-scale distributed systems, including advanced patterns for APIs, control planes, and data pipelines.
  • Deep understanding of global cloud infrastructure (AWS, GCP, Azure) and container ecosystems (Docker, Kubernetes).
  • Demonstrated ability to drive technical strategy and influence outcomes across organizational boundaries.
  • Outstanding ability to communicate complex technical concepts, drive organizational consensus, and mentor high-performing engineers.

Ways to Stand Out from the Crowd

  • History of leading development and adoption of organization-wide workflow orchestration systems for petabyte-scale infrastructure.
  • Experience in a Principal/Staff+ capacity delivering measurable improvements in operational efficiency, reliability, and security across a large engineering organization.
  • Deep familiarity with operational and deployment aspects of the NVIDIA AI/ML software stack (CUDA, cuDNN, containerization).
  • Patent contributions or a strong publication record in distributed systems, cloud computing, or infrastructure automation.

Compensation & Benefits

  • Base salary range: 272,000 USD - 431,250 USD (final base salary determined by location, experience, and pay of employees in similar positions).
  • Eligible for equity and benefits. See: https://www.nvidiabenefits.com/

Other details

  • Applications accepted at least until May 3, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and does not discriminate based on protected characteristics.