Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Leadership @ 4
Networking @ 4
Performance Optimization @ 4
Microservices @ 4
System Architecture @ 4
CUDA @ 4
GPU @ 4
AI @ 4
NVLink @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA DGX systems are the foundation of advanced AI infrastructure—purpose-built servers, workstations, and personal AI computers that bring together GPUs, CPUs, NVLink, NVIDIA Networking, and a fully optimized AI software stack.
We are seeking an engineering leader responsible for end-to-end delivery of every DGX compute system—from firmware through the AI stack to customer deployment. You will ensure each DGX product ships as a production-ready system where firmware, OS, drivers, CUDA, networking, and AI applications work together seamlessly, while driving architecture and roadmap for next-generation platforms.
Responsibilities
- End-to-end stack readiness: Ensure every DGX platform is ready for the full NVIDIA software stack—firmware, DGX OS, GPU drivers, CUDA toolkit, DCGM, DOCA/OFED, and management tools—as a validated, production-quality product. Own the GA software/firmware release process delivering firmware bundles, BaseOS ISOs, and release notes to OEM/OSV partners. Ensure platforms support AI agents like NemoClaw, Hermes agents, NIM microservices, and workloads customers expect out of the box.
- Platform firmware development: Lead development of the manageability firmware stack (BMC, BIOS, SBIOS) for all DGX platforms. Ensure firmware from partner teams (GPU, CPU, networking) integrates correctly at system level. Manage third-party vendors and drive platform requirements across all firmware areas.
- Validation strategy: Define validation strategy proving each DGX platform is production-ready: end-to-end system validation including firmware regression, NVQual certification, DL workload performance, OS/CUDA stack testing, multi-user scenarios, power/thermal validation, and field upgrade reliability. Establish quality gates and zero ship-stopper discipline.
- Platform bring-up & architecture: Drive platform bring-up for each new DGX system—coordinating first boot across new silicon (CPU, GPU), board design, and firmware teams. Own architectural strategy for next-generation platforms including firmware update mechanisms, system security posture, and AI application readiness.
- Customer deployment & enablement: Ensure firmware release flows meet CSP and enterprise deployment requirements. Represent DGX platform readiness in executive reviews and strategic planning with VP/SVP leadership. Engage with industry standards bodies (DMTF Redfish, OCP).
- Product delivery lifecycle: Own the complete DGX delivery lifecycle—system architecture, firmware development, integration, full-stack validation, GA release, and customer deployment—for every DGX product.
- Cross-org alignment: Serve as single point of accountability for DGX platform readiness across NVIDIA—aligning GPU, CPU, networking, security, OS, and AI software teams to deliver on schedule.
- Quality & vendor management: Own RCCA processes for field issues. Manage external vendor partnerships (AMI for SBIOS, BMC contributors) with clear quality gates and program tracking.
- Team leadership: Build and lead a world-class engineering organization. Mentor and develop leaders. Foster a culture of technical excellence, intellectual honesty, and customer obsession.
Requirements
- BS or MS in Computer Science, Electrical Engineering, or related field or equivalent experience.
- 12+ overall years in systems firmware/software engineering, with 5+ years in engineering leadership.
- Deep expertise in server system stack including SBIOS, BMC, OS, applications and system-level integration of complex multi-component products.
- Proven track record delivering multi-generation server or data center platforms from architecture through customer deployment.
- Experience managing engineering organizations across multiple geographies in a matrix environment.
- Strong understanding of server hardware: CPU, GPU, interconnect, memory, PCIe, power delivery.
- Experience owning end-to-end product quality—from firmware validation through full-stack system testing to field deployment.
Ways to stand out
- Experience with NVIDIA DGX, or GPU-accelerated server platforms. Track record driving server bring-up for new silicon and system architecture redesigns.
- Familiarity with DMTF Redfish, OCP standards, and server manageability ecosystems.
- Experience with AI/DL workload validation and performance optimization at the platform level. Demonstrated ability to operate at VP/SVP level, influencing cross-BU strategic decisions.
Compensation & additional information
- Base salary range: 320,000 USD - 488,750 USD (determined based on location, experience, and pay of employees in similar positions).
- You will also be eligible for equity and benefits (link to NVIDIA benefits referenced in the original posting).
- Applications for this job will be accepted at least until April 25, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.