Used Tools & Technologies
GPURequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Hiring @ 4
Communication @ 4
Networking @ 4
System Architecture @ 4
AI @ 4
InfiniBand @ 4
HPC @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. NVIDIA has a rapidly expanding ecosystem of data center platform and node designs—from single node HGX/DGX systems to large multi-node NVLink domain rack architectures. These designs bring together NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack.
The team is searching for a highly motivated technical leader to drive the engineering roadmap and innovation for rack system software architecture across firmware, kernel drivers, operating systems, networking, fabrics, user-mode drivers, and manageability software. The role requires working with internal component leads and engaging with hyperscaler/cloud service providers and vendors to take products to market.
Responsibilities
- Drive the software end-to-end architecture for NVIDIA's rack-scale products.
- Maintain deep understanding of the product portfolio and roadmap; translate forward-looking plans into clear, formal software requirements that anchor execution across the organization.
- Ensure high quality and reliable software; serve as a trusted architectural partner to teams requiring guidance or oversight.
- Work directly with major customers to understand their requirements and align their roadmap with NVIDIA's roadmap.
- Work with business partners and vendors to shape their products to meet NVIDIA's needs.
- Develop a roadmap of new technologies and protocols; drive their design and adoption.
- Mentor architects and engineering teams to grow them into future leaders.
- Make key technical decisions even when faced with ambiguity.
Requirements
- BS or MS in Computer Engineering, Computer Science, or related degree, or equivalent experience.
- 15+ years in system architecture and design.
- Deep experience designing architecture for scalable and performant server systems, particularly at the software/hardware interface.
- Strong understanding of networking technologies and protocols (e.g., Ethernet, InfiniBand).
- Previous experience working with complex system software for accelerators such as GPUs, DPUs, or FPGAs.
- Expertise in out-of-band and in-band management architectures.
- Knowledge of system management protocols such as Redfish and IPMI.
- Experience working with platform security experts to define tradeoffs between security and ease of use.
- Demonstrable experience implementing shift-left strategies to de-risk program execution.
- Excellent written and verbal communication skills.
Ways to stand out from the crowd
- Knowledge of large-scale cloud and cluster-level deployment and management systems; experience designing robust, resilient, and performant scale-up fabrics.
- Demonstrated track record of leading data center products across the entire lifecycle: inception, pre-silicon development, post-silicon bring-up, manufacturing, and deployment.
- Familiarity with CXL, UCIe and other chip-to-chip (C2C) technology architectures.
- Knowledge in storage and networking technologies.
Compensation & Benefits
- Base salary range: 320,000 USD - 488,750 USD (final base salary determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits.
Additional information
- Location: Santa Clara, CA, United States.
- Time type: Full time.
- Applications accepted at least until March 24, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.