Distinguished Engineer - Rack System Software

at Nvidia
USD 308,000-471,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Communication @ 4 Networking @ 4 System Architecture @ 4

Details

NVIDIA is seeking a highly motivated technical architect to drive roadmap and innovation in rack system software architecture across firmware, kernel drivers, operating systems, fabrics, user-mode drivers and manageability software. The role focuses on NVIDIA's data center platform & node designs from single-node HGX/DGX systems to large multi-node NVLink domain rack architectures and requires close collaboration with component leads, partners, vendors and hyperscale/cloud service providers.

Responsibilities

  • Drive the software architecture for NVIDIA's HGX, DGX and multi-node NVLink platforms in a cross-functional environment.
  • Work directly with major customers to understand requirements and align customer roadmaps with NVIDIA's roadmap.
  • Collaborate with business partners and vendors to shape their products to meet NVIDIA's needs.
  • Develop a roadmap of new technologies and protocols; drive their design and adoption.
  • Mentor architects and engineering teams and grow them into future leaders.
  • Make key technical decisions under ambiguity and mitigate execution risks by applying shift-left strategies to accelerate time to market.

Requirements

  • BS or MS degree in Computer Engineering, Computer Science, or related degree, or equivalent experience.
  • 16+ years of experience in system architecture and design.
  • Deep experience designing architectures for scalable and performant server systems, particularly at the software/hardware interface.
  • Prior experience working with complex system software for accelerators such as GPUs, DPUs, or FPGAs.
  • Expertise in out-of-band and in-band management architectures.
  • Knowledge of device management protocols such as MCTP, PLDM and RDE.
  • Knowledge of system management protocols such as Redfish and IPMI.
  • Experience working with platform security experts to define tradeoffs between security and ease of use.
  • Demonstrable experience implementing shift-left strategies to de-risk program execution.
  • Excellent written and verbal communication skills; demonstrated ability to work with customers and partners.

Ways to stand out

  • Knowledge of cloud and cluster-level deployment and management systems.
  • Participation and contributions in standards bodies such as OCP and DMTF.
  • Familiarity with CXL, UCIE and other chip-to-chip (C2C) technology architectures.
  • Knowledge in storage and networking technologies.

Technologies and platforms referenced

  • NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs
  • HGX, DGX and multi-node NVLink platforms
  • Device and system management protocols: MCTP, PLDM, RDE, Redfish, IPMI
  • Accelerators: GPUs, DPUs, FPGAs
  • Interconnects and emerging C2C technologies: CXL, UCIE
  • Standards and ecosystem: OCP, DMTF

Compensation & Benefits

  • Base salary range: 308,000 USD - 471,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
  • Eligible for equity and additional benefits (see NVIDIA benefits page).

Additional information

  • Applications accepted at least until August 13, 2025.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.