Distinguished Engineer β Data Center System Software Architect
at Nvidia
π Santa Clara, United States
USD 308,000-471,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Linux @ 4 Networking @ 4 System Architecture @ 4 CUDA @ 3Details
NVIDIA data center systems, such as DGX and HGX, are core to NVIDIA's enterprise and cloud provider businesses. These platforms bring together NVIDIA GPUs, NVLink, InfiniBand networking, NVIDIA Grace CPUs, and an optimized NVIDIA AI and HPC software stack. This role is for a technical architect to own the end-to-end architecture of these products at the system software level, including firmware, kernel drivers, operating systems, and user-mode drivers. The role involves working with internal component leads and engaging with cloud service providers to bring products to market.
Responsibilities
- Serve as the primary technical point of contact for major customers: lead technical discussions, define KPIs, gather requirements, and address complex technical queries.
- Act as a system software architect to lead technical innovation and strategic collaborations with major hyperscalers to architect next-generation data center products.
- Align NVIDIA's roadmap with major customers' requirements through direct engagement.
- Develop and drive adoption of new technologies and protocols.
- Make critical technical decisions in ambiguous situations and mitigate risks through left-shift strategies.
- Lead complex, cross-functional projects and influence stakeholders without direct authority.
Requirements
- Deep expertise in scalable and performant server system architecture, with emphasis on software/hardware interfaces.
- Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
- Mastery of system firmware (SBIOS, OpenBMC), embedded systems, and Linux kernel internals.
- Proficiency with Out-of-Band and In-Band management architectures and device management protocols (examples listed: MCTP, PLDM, SPDM, RDE).
- Knowledge of system management protocols such as Redfish and IPMI.
- Extensive knowledge of networking technologies and protocols, including TCP/IP, Ethernet, InfiniBand, and advanced switching/routing concepts.
- Experience collaborating with platform security experts to define tradeoffs between security and usability.
- Demonstrated success leading complex, cross-functional programs and implementing left-shift strategies to de-risk program execution.
- BS or MS degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience).
- 20+ years in system architecture and design.
Ways to stand out
- Knowledge of cloud and cluster-level deployment and management systems.
- Participation or contributions in standards bodies such as OCP and DMTF.
- Familiarity with NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA).
- Knowledge of enterprise storage architectures and distributed parallel processing paradigms.
Compensation & Benefits
- Base salary range: 308,000 USD - 471,500 USD (final base salary determined by location, experience, and comparable roles).
- Eligible for equity and benefits (see NVIDIA benefits page).
Other
- Applications accepted at least until October 22, 2025.
- NVIDIA is an equal opportunity employer and values diversity.