Distinguished Engineer – Data Center System Software Architect

at Nvidia
USD 308,000-471,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Linux @ 4 Networking @ 4 System Architecture @ 4 CUDA @ 3

Details

NVIDIA data center systems (DGX, HGX) integrate NVIDIA GPUs, NVLink, InfiniBand networking, NVIDIA Grace CPUs and an optimized AI/HPC software stack. This role is for a senior system software architect to own end-to-end architecture at the system software level, across firmware, kernel drivers, operating systems, and user-mode drivers. The role engages internal component leads and hyperscaler/cloud customers to take products to market.

Responsibilities

  • Serve as the primary technical point of contact for major customers: lead technical discussions, define KPIs, gather requirements, and address complex technical queries.
  • Lead technical innovation and strategic collaborations with major hyperscalers to architect next-generation data center products.
  • Align NVIDIA's roadmap with major customers' requirements through direct engagement.
  • Develop and drive adoption of new technologies and protocols.
  • Make critical technical decisions in ambiguous situations and mitigate risks through left-shift strategies.
  • Lead complex, cross-functional projects to completion and influence large-scale collaborative environments without direct authority.

Requirements

  • Deep expertise in scalable and performant server system architecture, with focus on software/hardware interfaces.
  • Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
  • Mastery of system firmware (SBIOS, OpenBMC), embedded systems, and Linux kernel internals.
  • Strong experience implementing and developing kernel drivers, user-mode drivers, and related OS-level components.
  • Proficiency with Out-of-Band and In-Band management architectures and device management protocols (MCTP, PLDM, SPDM, RDE), and system management protocols (Redfish, IPMI).
  • Extensive knowledge of networking technologies and protocols including TCP/IP, Ethernet, InfiniBand, and advanced switching and routing concepts.
  • Experience collaborating with platform security experts to define tradeoffs between security and usability.
  • Demonstrated success leading complex, cross-functional programs and executing left-shift strategies to de-risk program execution.
  • BS or MS in Computer Science, Electrical Engineering or related field (or equivalent experience).
  • 20+ years of experience in system architecture and design.

Ways to stand out / Preferred

  • Knowledge of cloud and cluster-level deployment and management systems.
  • Participation in standards bodies such as OCP and DMTF.
  • Familiarity with NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA).
  • Knowledge of enterprise storage architectures and distributed parallel processing paradigms.

Benefits

  • Base salary range: 308,000 USD - 471,500 USD (determined based on location, experience, and market).
  • Eligibility for equity and company benefits.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.

Additional details

  • Location: Santa Clara, CA, United States.
  • Employment type: Full time.
  • Applications accepted at least until August 13, 2025.