Principal Firmware Engineer – Server Manageability and Observability

at Nvidia

📍 Santa Clara, United States

USD 272,000-425,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Linux @ 4 Networking @ 4 System Architecture @ 4 CUDA @ 3

Details

NVIDIA data center systems (DGX, HGX) bring together NVIDIA GPUs, NVLink, InfiniBand networking, NVIDIA Grace CPUs and a fully optimized AI/HPC software stack. We are looking for a strong technical architect to own the end-to-end architecture of these products at the system software level, including firmware, kernel drivers, operating systems, and user-mode drivers. This role will work with internal component leads and engage with major cloud service providers to take these products to market.

Responsibilities

Serve as the primary technical point of contact for major customers: lead technical discussions, define KPIs, gather requirements, and address complex technical queries.
Act as a system software architect to lead technical innovation and strategic collaborations with hyperscalers for next-generation data center products.
Align NVIDIA's roadmap with major customers' requirements through direct engagement.
Develop and drive adoption of new technologies and protocols relevant to server manageability and observability.
Make critical technical decisions in ambiguous situations and mitigate program risks through left-shift strategies.
Lead complex, cross-functional projects and influence outcomes without direct authority.

Requirements

Deep expertise in scalable and performant server system architecture, focusing on software/hardware interfaces.
Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
Mastery of system firmware (SBIOS, OpenBMC), embedded systems, and Linux kernel internals.
Proficiency with kernel drivers and both out-of-band and in-band management architectures.
Experience with device management protocols (MCTP, PLDM, SPDM, RDE) and system management protocols (Redfish, IPMI).
Extensive knowledge of networking technologies and protocols, including TCP/IP, Ethernet, InfiniBand, and advanced switching and routing concepts.
Experience collaborating with platform security experts to define tradeoffs between security and usability.
Demonstrated success leading complex, cross-functional programs to completion and implementing left-shift strategies to de-risk execution.
BS or MS in Computer Science, Electrical Engineering or related field, or equivalent experience.
15+ years in system architecture and design.

Ways to Stand Out

Knowledge of cloud and cluster-level deployment and management systems.
Participation or contributions in standards bodies such as OCP and DMTF.
Familiarity with NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA).
Knowledge of enterprise storage architectures and distributed parallel processing paradigms.

Benefits and Additional Information

Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and comparable employees).
Eligible for equity and company benefits. (See NVIDIA benefits page.)
Applications accepted at least until August 16, 2025.
NVIDIA is an equal opportunity employer committed to diversity and inclusion.