Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Jenkins @ 4
Linux @ 6
Python @ 6
CI/CD @ 4
Leadership @ 6
Communication @ 4
Product Management @ 4
JSON @ 3
OAuth @ 3
Reporting @ 4
System Architecture @ 7
LLM @ 4
Compliance @ 4
GPU @ 4
AI @ 4
Performance Analysis @ 7
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is seeking an experienced senior manager to drive software and firmware releases for rack-scale GPU-based datacenter servers (DGX, HGX, MGX). The Datacenter Software Tools team delivers infrastructure and tools for data center deployment, firmware and software package deployment, and server manageability. This role focuses on release management, CI/CD automation, packaging, validation, and cross-functional collaboration to ensure high-quality, secure, and scalable releases.
Responsibilities
- Lead technical direction for how releases are delivered to end customers of rack-scale computing (compute and switch trays), building end-to-end infrastructure and workflows to ensure high-quality firmware and software releases.
- Define release scope for rack-scale products working cross-functionally with product management, technical architects, and program management; ensure releases flow through validation matrices for customer use cases.
- Influence architecture, design, and implementation decisions for compute and switch trays software and firmware; ensure quality across nightly, dev, and production drops with appropriate release-validation strategies.
- Partner with developers, SWQA, and product engineering to left-shift release quality, enforce quality metrics, track KPIs (e.g., MTTR, release cadence), and report release progress to stakeholders.
- Own ingestion and packaging of software and firmware binaries for deployment across multiple platforms and CSP environments.
- Document procedures, refine release workflows, identify and remove bottlenecks in packaging and deployment, and shape the team's roadmap (including self-service interfaces, automation, AI-assisted validation/triage, and compliance reporting).
- Continuously identify improvement opportunities in release processes, infrastructure, and practices with a strong focus on automation and measurable targets.
Requirements
- 12+ years overall in the software industry with specialization in system software and/or firmware development.
- 5+ years of proven technical hands-on leadership for multi-team organizations across data center firmware (BMC, FPGA, CPLDs, network switches) and building infrastructure to continuously improve release quality.
- BS/MS/PhD in CS, CE, EE, or a related technical field or equivalent experience.
- Prior experience in systems software or firmware development with a history of guiding complex software features or products through the full product life cycle, ideally on rack-scale datacenter products.
- Strong understanding of computer system architecture, operating systems principles, HW-SW interactions, and performance analysis/optimizations.
- Working fluency in Python and Linux sufficient to review designs, prototype tooling, and debug production issues alongside the team.
- Hands-on experience with web application frameworks and CI/CD platforms (Jenkins, GitLab, Artifactory).
- Track record of balancing multiple projects and delivering against measurable benchmarks (MTTR, specification compliance, release cadence, automation coverage).
- Excellent communication and collaboration skills across teams and time zones.
Ways to stand out
- Familiarity with datacenter server software architecture and in-band/out-of-band management of firmware and hardware components.
- Understanding REST style (JSON over HTTPS with OAuth) and familiarity with DMTF / PLDM / SPDM firmware management protocols.
- Proven experience developing a self-service release infrastructure that reduces onboarding SLA times.
- Experience integrating AI/LLM tooling into engineering workflows for triage, test generation, code review, or release validation.
- Experience leading geographically distributed engineering teams across US and APAC.
Benefits
- Base salary range provided (location- and experience-dependent): 272,000 USD - 431,250 USD.
- Eligible for equity and company benefits.
Additional notes
- Applications accepted at least until May 4, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity and inclusion.