Principal Firmware Engineer – Server Manageability and Observability
at Nvidia
USD 272,000-425,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Linux @ 4 Networking @ 4 System Architecture @ 4 CUDA @ 3Details
NVIDIA data center systems (DGX, HGX) bring together NVIDIA GPUs, NVLink, InfiniBand networking, NVIDIA Grace CPUs and a fully optimized AI/HPC software stack. We are looking for a strong technical architect to own the end-to-end architecture of these products at the system software level, including firmware, kernel drivers, operating systems, and user-mode drivers. This role will work with internal component leads and engage with major cloud service providers to take these products to market.
Responsibilities
- Serve as the primary technical point of contact for major customers: lead technical discussions, define KPIs, gather requirements, and address complex technical queries.
- Act as a system software architect to lead technical innovation and strategic collaborations with hyperscalers for next-generation data center products.
- Align NVIDIA's roadmap with major customers' requirements through direct engagement.
- Develop and drive adoption of new technologies and protocols relevant to server manageability and observability.
- Make critical technical decisions in ambiguous situations and mitigate program risks through left-shift strategies.
- Lead complex, cross-functional projects and influence outcomes without direct authority.
Requirements
- Deep expertise in scalable and performant server system architecture, focusing on software/hardware interfaces.
- Extensive experience with complex system software for accelerators (GPUs, DPUs, FPGAs).
- Mastery of system firmware (SBIOS, OpenBMC), embedded systems, and Linux kernel internals.
- Proficiency with kernel drivers and both out-of-band and in-band management architectures.
- Experience with device management protocols (MCTP, PLDM, SPDM, RDE) and system management protocols (Redfish, IPMI).
- Extensive knowledge of networking technologies and protocols, including TCP/IP, Ethernet, InfiniBand, and advanced switching and routing concepts.
- Experience collaborating with platform security experts to define tradeoffs between security and usability.
- Demonstrated success leading complex, cross-functional programs to completion and implementing left-shift strategies to de-risk execution.
- BS or MS in Computer Science, Electrical Engineering or related field, or equivalent experience.
- 15+ years in system architecture and design.
Ways to Stand Out
- Knowledge of cloud and cluster-level deployment and management systems.
- Participation or contributions in standards bodies such as OCP and DMTF.
- Familiarity with NVIDIA HPC programming models and libraries (CUDA, cuDNN, DOCA).
- Knowledge of enterprise storage architectures and distributed parallel processing paradigms.
Benefits and Additional Information
- Base salary range: 272,000 USD - 425,500 USD (determined based on location, experience, and comparable employees).
- Eligible for equity and company benefits. (See NVIDIA benefits page.)
- Applications accepted at least until August 16, 2025.
- NVIDIA is an equal opportunity employer committed to diversity and inclusion.