Principal Firmware Engineer – Server Manageability and Observability

at Nvidia
USD 272,000-425,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Linux @ 4 Networking @ 4 System Architecture @ 4 CUDA @ 3

Details

NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. This role is a strong technical architect position owning end-to-end architecture at the system software level, including firmware, kernel drivers, operating systems, and user-mode drivers. You will work with component leads internally and engage with industry-leading cloud service providers to bring these products to market.

Responsibilities

  • Serve as the primary technical point of contact for major customers: lead technical discussions, define KPIs, gather requirements, and address complex technical queries.
  • Act as a system software architect and lead technical innovation and strategic collaborations with major hyperscalers to architect next-generation data center products.
  • Align NVIDIA's roadmap with major customers' requirements through direct engagement.
  • Develop and drive adoption of new technologies and protocols.
  • Make critical technical decisions in ambiguous situations and mitigate risks through left-shift strategies.

Requirements

  • Deep expertise in scalable and performant server system architecture, with focus on software/hardware interfaces.
  • Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
  • Mastery of system firmware (SBIOS, OpenBMC), embedded systems, and Linux kernel internals.
  • Proficiency in Out-of-Band and In-Band management architectures and device management protocols (e.g., MCTP, PLDM, SPDM, RDE) and system management protocols (Redfish, IPMI).
  • Extensive knowledge of networking technologies and protocols, including TCP/IP, Ethernet, InfiniBand, and advanced switching and routing concepts.
  • Experience collaborating with platform security experts to define tradeoffs between security and ease of use.
  • Demonstrated success leading complex, cross-functional projects to completion and influencing outcomes without direct authority; experience implementing left-shift strategies to de-risk program execution.
  • BS or MS degree in Computer Science, Electrical Engineering, or related field (or equivalent experience).
  • 15+ years in system architecture and design.

Ways to Stand Out

  • Knowledge of cloud and cluster-level deployment and management systems; participation and contributions in standards bodies such as OCP and DMTF.
  • Familiarity with NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA).
  • Knowledge of enterprise storage architectures and distributed parallel processing paradigms.

Benefits & Other Information

  • Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and pay of employees in similar positions).
  • Eligible for equity and additional benefits (link to NVIDIA benefits page referenced in original posting).
  • Applications for this job will be accepted at least until October 22, 2025.

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. The company does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.