Senior Technical Program Manager - GPU Clusters

at Nvidia

📍 Santa Clara, United States

$188,000-299,000 per year

SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 4 DevOps @ 4 Communication @ 4 Prioritization @ 4 Project Management @ 4 Reporting @ 4 Agile @ 4

Details

Hardware Infrastructure is seeking a Senior Technical Program Manager to lead the strategy and execution of programs to support the bringup, operations and automation of GPU infrastructure. The GPU infrastructure we build and operate enables NVIDIA's most advanced AI and hardware researchers and engineers to create the future of computing. This is a fast paced and evolving landscape that requires a senior TPM leader to guide engineering roadmaps to be delivered with high quality outcomes and a strong foundation of operational excellence. They will partner both internally within Hardware Infrastructure and externally with senior management and partner teams to scale the clusters operations charter. They will develop and standardize planning, reporting and execution methodologies and metrics to enable meeting the challenging objectives.

Responsibilities

  • Engage with cross-company partners to shape the technical strategy, build programs and coordinate execution to meet key business objectives that support scaling bringups to be seamless, fast and efficient.
  • Nurture a culture of continuous improvement, finding new opportunities across tooling, automation and processes to scale cluster operations and management.
  • Guide a diverse set of engineering efforts in an agile program methodology across planning, prioritization, design, dependency management, implementation and execution.
  • Bring a data first approach to programs (metrics, OKRs, KPIs) to effectively measure program success and for identifying areas of improvement.
  • Create effective communication channels to provide varying audience levels insights into program status, risks and opportunities.
  • Act as an effective technical and non-technical liaison between developers, customers and partners to drive organization alignment across a multi-functional matrixed set of leads.

Requirements

  • B.S. (or equivalent experience) in Computer Science or a related technical discipline.
  • 12+ years of experience across software engineering and/or technical program management roles with demonstrated expertise and mastery of technical and management practices.
  • Prior experience in infrastructure software, production application software development and large scale distributed computing.
  • Experience managing large scale HPC and/or AI Infrastructure deployments that stretch across hardware and software.
  • Exceptional communication and presentation skills for diverse technical and non-technical audiences.
  • Strong multitasking abilities with a focus on thoroughness and rapid context switching.
  • Knowledge of agile methodologies and the best in class project management tools.
  • Proactive and enthusiastic in identifying and implementing positive changes in software engineering and release management within a fast-paced environment.

Ways To Stand Out From The Crowd

  • Prior experience bringing up new datacenter capacity across cloud service providers and on-premise locations.
  • Prior experience migrating platforms and solutions from on prem to cloud.
  • Prior experience in working with AI researchers and/or EDA developers.
  • Software development, release and support methodology and devops.

NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to tackle, that only we can pursue, and that matter to the world. This is our life’s work: to amplify human creativity and intelligence. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and hardworking people in the world working for us. If you're creative, autonomous, and love a challenge, we want to hear from you. Come join our team and help build the real-time, efficient computing platform driving our success in this exciting and quickly growing field.