Principal Firmware Engineer – Server Manageability and Observability

at Nvidia
USD 272,000-425,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Linux @ 4 Networking @ 4 System Architecture @ 4 CUDA @ 3

Details

NVIDIA data center systems (DGX, HGX) bring together NVIDIA GPUs, NVLink, InfiniBand networking, NVIDIA Grace CPUs and a fully optimized AI/HPC software stack. We are looking for a strong technical architect to own the end-to-end architecture of these products at the system software level, including firmware, kernel drivers, operating systems, and user-mode drivers. This role will work with internal component leads and engage with major cloud service providers to take these products to market.

Responsibilities

  • Serve as the primary technical point of contact for major customers: lead technical discussions, define KPIs, gather requirements, and address complex technical queries.
  • Act as a system software architect to lead technical innovation and strategic collaborations with hyperscalers for next-generation data center products.
  • Align NVIDIA's roadmap with major customers' requirements through direct engagement.
  • Develop and drive adoption of new technologies and protocols relevant to server manageability and observability.
  • Make critical technical decisions in ambiguous situations and mitigate program risks through left-shift strategies.
  • Lead complex, cross-functional projects and influence outcomes without direct authority.

Requirements

  • Deep expertise in scalable and performant server system architecture, focusing on software/hardware interfaces.
  • Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
  • Mastery of system firmware (SBIOS, OpenBMC), embedded systems, and Linux kernel internals.
  • Proficiency with kernel drivers and both out-of-band and in-band management architectures.
  • Experience with device management protocols (MCTP, PLDM, SPDM, RDE) and system management protocols (Redfish, IPMI).
  • Extensive knowledge of networking technologies and protocols, including TCP/IP, Ethernet, InfiniBand, and advanced switching and routing concepts.
  • Experience collaborating with platform security experts to define tradeoffs between security and usability.
  • Demonstrated success leading complex, cross-functional programs to completion and implementing left-shift strategies to de-risk execution.
  • BS or MS in Computer Science, Electrical Engineering or related field, or equivalent experience.
  • 15+ years in system architecture and design.

Ways to Stand Out

  • Knowledge of cloud and cluster-level deployment and management systems.
  • Participation or contributions in standards bodies such as OCP and DMTF.
  • Familiarity with NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA).
  • Knowledge of enterprise storage architectures and distributed parallel processing paradigms.

Benefits and Additional Information

  • Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and comparable employees).
  • Eligible for equity and company benefits. (See NVIDIA benefits page.)
  • Applications accepted at least until August 16, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.