Principal Systems Software Engineer - DGX Cloud

at Nvidia
USD 272,000-425,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Kubernetes @ 4 SQL @ 4 Data Structures @ 4 Hiring @ 4 Leadership @ 4 Communication @ 4 Networking @ 4 SRE @ 4 Rust @ 4 Asynchronous Programming @ 4 API @ 4 HTTP @ 4

Details

The NVIDIA DGX Cloud organization is building NVIDIA’s accelerated compute infrastructure. This role focuses on software to assist in rapid bring-up, operation, configuration, and troubleshooting of compute hardware and networking equipment. As a Principal Systems Software Engineer, you will collaborate with software engineers, product architects, and product managers to deliver and support end-to-end software solutions for complex cloud infrastructure deployments. You will write services and software that align with the architectural vision for the NVIDIA Cloud Platform, and you will own your code from development through test and production, including operational support.

Responsibilities

  • Work with NVIDIA internal customers and demanding stakeholders.
  • Design and build scalable software systems to manage NVIDIA’s cloud infrastructure.
  • Participate in responses to real-time operational events.
  • Build network and systems automation software for managing a multi-tenant cloud infrastructure.
  • Participate in open-source communities for software NVIDIA leverages and builds.
  • Present roadmaps, vision, and demos to internal stakeholders and NVIDIA leadership.

Requirements

  • 15+ years of experience designing and building distributed software systems.
  • Track record of directly supporting systems with external customers or demanding internal customers.
  • BS/MS degree in Computer Science or related areas (or equivalent experience).
  • Demonstrated ability to write code in mainstream systems programming languages such as C, C++, Golang, or Rust.
  • Demonstrated ability to design and implement maintainable APIs for consumers.
  • Practical experience with asynchronous programming, type safety, threading models, state machines, and data structures.
  • Background in data persistence (SQL or similar).
  • Understanding of secure communication protocols (mutual-TLS, IPsec, or similar).
  • Knowledge of SRE principles (observability, SLOs, logging, etc.).

Ways to Stand Out

  • Experience at a hyperscale cloud service provider (public-facing or internal).
  • Understanding of networking protocols such as IP, IPv6, BGP, HTTP, ICMP, and tunneling protocols (VXLAN, Geneve, FoU, GRE).
  • Familiarity with Infiniband networking.
  • Background with host management systems (DHCP, Redfish, UEFI) and host security services such as TPM, TXT, and SecureBoot.
  • Experience with Kubernetes and/or distributed task scheduling.

Benefits & Other Info

  • Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and pay of employees in similar positions).
  • Eligible for equity and additional benefits.
  • Applications accepted at least until October 24, 2025.

Company

NVIDIA leads developments in Artificial Intelligence, High-Performance Computing, and Visualization. NVIDIA seeks people passionate about developing cloud services and accelerating the next wave of AI. NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.