Capacity Operations Manager

at Nvidia
USD 132,000-253,000 per year
MIDDLE SENIOR
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

GCP @ 3 Machine Learning @ 6 Technical Proficiency @ 6 AWS @ 3 Azure @ 3 Communication @ 5 IaaS @ 6 Reporting @ 3 Cloud Computing @ 1 GPU @ 3

Details

Our technology has no boundaries. NVIDIA builds groundbreaking computing platforms that enable scientists, researchers, and engineers to advance their ideas. Our visual computing technology delivers high performance and energy efficiency and powers applications from AI to visualization.

Responsibilities

  • Orchestrate the build out of High Performance Computing (HPC) clusters, working closely with internal and external engineering teams.
  • Manage and optimize GPU capacity and other compute resources across various cloud service providers to meet growing demand and ensure efficient utilization.
  • Build, develop, and maintain data models, reporting systems, data automation systems, dashboards, and performance metrics that support NVIDIA Infrastructure governance programs and strategic capacity decisions.
  • Analyze technical and business needs for GPU capacity and other compute resources from various internal and external teams.
  • Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with infrastructure teams to resolve them.
  • Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams.
  • Develop and enhance tooling for cloud infrastructure and analytics platforms to optimize resource usage and performance, including automation and leveraging AI techniques to extract insights from generated data.
  • Partner and cross-collaborate with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop Infrastructure and Service Level KPIs tied to customer satisfaction.
  • Lead multi-year, budget-based compute resource planning with finance, procurement, and engineering.

Requirements

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field, or equivalent experience.
  • 5+ years of experience in cloud computing, specifically managing or utilizing GPU capacity for high performance computing; proven track record in large-scale computing operations and planning is a plus.
  • Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets.
  • Hands-on experience with command line interfaces and shell scripting languages.
  • Deep understanding of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies. Hands-on experience with cloud providers such as AWS, Azure, GCP, and OCI is required.
  • Demonstrated experience leveraging AI tools and techniques to extract signals and insights from data to improve resource usage and automation.
  • Strong understanding and practical application of statistical modeling and machine learning methodologies to improve operational efficiency and inform strategic capacity decisions.
  • Knowledge of analytics, statistical modeling, and machine learning methodologies.
  • Excellent communication and interpersonal skills; ability to collaborate across departments and influence strategic decisions.
  • Naturally curious, accountable, and responsible, with the ability to operate effectively amid uncertainty and rapidly changing business conditions.

Compensation & Other Details

  • Base salary ranges by level (determined by location, experience, and pay of employees in similar positions):
    • Level 3: 132,000 USD — 207,000 USD
    • Level 4: 160,000 USD — 253,000 USD
  • You will also be eligible for equity and benefits.
  • #LI-Hybrid
  • Applications accepted at least until December 14, 2025.

Company & Equal Opportunity

NVIDIA is leading developments in Artificial Intelligence, High-Performance Computing, and Visualization. NVIDIA is an equal opportunity employer and values diversity across protected characteristics.