Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
DevOps @ 4
Distributed Systems @ 7
Leadership @ 7
Communication @ 4
Performance Optimization @ 4
GPU @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is seeking a Senior Director, System Software Engineering to lead strategy and execution for capacity management in DGX Cloud, building the capacity foundation for NVIDIA's internal AI research clusters. This leader will shape the roadmap for scalable system software that automates GPU management at scale, drive execution across teams and functions, and partner closely with architecture, security, product, and developer platform leaders to deliver reliable, high-performance software that powers the next generation of accelerated computing.
Responsibilities
- Define and drive the system software strategy for capacity management and automation for DGX Cloud's GPU cloud platforms, aligning long-range technical direction with business and product priorities.
- Lead engineering leaders responsible for core platform capabilities such as runtime software, host and cluster management, provisioning, observability, reliability, security, and performance optimization.
- Build a strong execution model across planning, architecture reviews, release readiness, quality, and operational excellence for software delivered across on-prem and cloud environments.
- Partner closely with security, DevOps, research, and product organizations to translate platform requirements into scalable software roadmaps and high-quality releases.
- Establish measurable goals for engineering efficiency, service reliability, software quality, and customer impact, using data to continuously improve delivery and operations.
- Attract, develop, and retain world-class engineering leaders while fostering technical excellence, accountability, inclusion, and innovation.
Requirements
- BS, MS, or PhD in Computer Science, Computer Engineering, or a related technical field, or equivalent experience.
- 16+ overall years of relevant management experience in system software, platform software, or distributed systems engineering, with 7+ years of significant leadership experience leading engineering organizations.
- Deep technical expertise in operating systems, distributed systems, platform architecture, cloud infrastructure, or large-scale systems software.
- Demonstrated experience leading delivery of complex software platforms spanning reliability, performance, scalability, security, and observability.
- Strong record of leadership and influence across engineering, product, program management, and executives.
- Demonstrated success building and leading high-performing teams, developing leaders, and scaling organizations through growth and change.
- Excellent technical communication and decision-making, with the ability to connect architecture choices to business outcomes.
- Demonstrated experience with industry-leading AI tools that help engineers and engineering leaders work more efficiently.
Ways to stand out from the crowd
- Experience with AI infrastructure, accelerated computing, GPU-optimized software stacks, or large-scale training and inference environments.
- Experience leading platform software for cloud-native or hybrid-cloud deployments.
- Track record of driving architectural simplification and operational excellence across large, complex engineering portfolios.
- Experience partnering with open-source communities and ecosystem partners on platform adoption and enablement.
Compensation and benefits
- Base salary range: 384,000 USD - 575,000 USD (determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (see https://www.nvidia.com/en-us/benefits/ for details).
Additional information
- Applications for this job will be accepted at least until May 30, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.