Senior Technical Program Manager, AI and ML Software

at Nvidia
USD 160,000-304,800 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 4 DevOps @ 4 Communication @ 4 Prioritization @ 4 Project Management @ 4 Reporting @ 4 Agile @ 4 GPU @ 4

Details

Hardware Infrastructure is seeking a Senior Technical Program Manager to own the strategy and execution of programs to support the bringup, operations and automation of GPU infrastructure. The GPU infrastructure we build and operate enables NVIDIA's most sophisticated AI and hardware researchers and engineers to invent the future of computing. This is a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be delivered with high quality outcomes and a strong foundation of operational perfection. You will partner internally within Hardware Infrastructure and externally with senior management and partner teams to scale the cluster operations charter, and will develop and standardize planning, reporting and execution methodologies and metrics to enable meeting challenging objectives.

Responsibilities

  • Engage with cross-company partners to compose technical strategy, build programs and coordinate execution to meet key business objectives that support scaling bringups to be seamless, fast and efficient.
  • Nurture a culture of continuous improvement; identify opportunities across tooling, automation and processes to scale cluster operations and management.
  • Guide a diverse set of engineering efforts in an agile program methodology across planning, prioritization, design, dependency management, implementation and execution.
  • Bring a data-first approach to programs (metrics, OKRs, KPIs) to effectively measure program success and identify areas of improvement.
  • Create effective communication channels to provide varying audience levels insights into program status, risks and opportunities.
  • Act as an effective technical and non-technical liaison between developers, customers and partners to drive organization alignment across a multi-functional, matrixed set of leads.

Requirements

  • B.S. (or equivalent experience) in Computer Science or a related technical discipline.
  • 10+ years of experience across software engineering and/or technical program management roles with demonstrated expertise and mastery of technical and management practices.
  • Demonstrated experience with infrastructure software, production application software development and large-scale distributed computing.
  • Experience leading large-scale HPC and/or AI infrastructure deployments that span hardware and software.
  • Outstanding communication and presentation abilities suited for a wide range of technical and non-technical audiences.
  • Strong multitasking abilities with a focus on thoroughness and rapid context switching.
  • Knowledge of agile methodologies and best-in-class project management tools.
  • Proactive and enthusiastic in identifying and implementing positive changes in software engineering and release management within a fast-paced environment.

Ways To Stand Out

  • Prior experience bringing up new datacenter capacity across cloud service providers and on-premise locations.
  • Prior experience migrating platforms and solutions from on-prem to cloud.
  • Background working with AI researchers and/or EDA developers.
  • Experience with software development, release and support methodology and DevOps.

Compensation & Benefits

  • Base salary ranges provided by level:
    • Level 4: 160,000 USD - 253,000 USD
    • Level 5: 192,000 USD - 304,750 USD
  • You will also be eligible for equity and benefits (see NVIDIA benefits).

Applications for this job will be accepted at least until October 11, 2025.

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#deeplearning