Director, Site Reliability and Software Engineering - DGX Cloud
at Nvidia
USD 320,000-575,000 per year
Used Tools & Technologies
SRERequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 6
Linux @ 7
DevOps @ 4
Distributed Systems @ 4
Leadership @ 4
People Management @ 4
Mentoring @ 4
Product Management @ 4
Reporting @ 4
Engineering Management @ 8
Cloud Computing @ 4
GPU @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is the AI computing company driving modern AI with GPUs. The NVIDIA GPU Cloud (NGC) is a GPU-accelerated platform used by data scientists and researchers to build, train, and deploy neural network models. The DGX Cloud Computing team is looking for a leader to manage software, automation, and operations of multi-colo distributed NVIDIA GPU cloud clusters and contribute to product strategy.
Responsibilities
- Manage a team of Software and Site Reliability engineers, including program development, task planning and code reviews.
- Define team strategy and roadmap; drive adoption of scalable SDLC practices, test infrastructure, and modern practices within Nvidia’s DGX Cloud Computing environment.
- Drive technical projects and provide leadership in an innovative and fast-paced environment.
- Be responsible for the overall planning, tracking and success of technical projects.
- Work closely with project and product management teams to ensure best-in-class product development.
- Contribute technically to projects for DGX Cloud Computing Services.
- Interact with key internal stakeholders to provide operational and financial clarity on technical spend.
- Drive decision making, visibility and operational rigor across business analytic initiatives such as budget and project & portfolio reporting; lead executive reporting, dashboards, and operational CTO metrics focused on continuous improvement.
Requirements
- 12+ years of overall experience in engineering management; 5+ years of leadership experience.
- Bachelor’s or Master’s degree in Computer Science or equivalent experience.
- Experience designing and implementing large-scale distributed systems.
- Experience with containers, virtualization environments, and cluster solutions; experience managing Technical Support / DevOps teams.
- Strong knowledge of Unix/Linux.
- Experience implementing tools, processes, internal instrumentation, methodologies and resolving blockages.
- Demonstrated people management and leadership skills with a proven track record of mentoring and coaching team members.
- Ability to quickly learn and evaluate new technologies and to influence and establish relationships with other software and IT functional groups (development, server, storage, security).
Compensation and Other Details
- Base salary ranges (determined by location, experience, and peer pay):
- Level 5: 320,000 USD - 488,750 USD
- Level 6: 384,000 USD - 575,000 USD
- You will also be eligible for equity and benefits (link provided in original posting).
- Applications accepted at least until May 8, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.