Capacity Operations and Analytics Manager

at Nvidia

📍 Santa Clara, United States

USD 200,000-322,000 per year

MIDDLE

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Grafana @ 5 Prometheus @ 5 Kibana @ 5 Tableau @ 5 GCP @ 3 Machine Learning @ 6 Technical Proficiency @ 6 AWS @ 3 Azure @ 3 Communication @ 5 Planning @ 3 IaaS @ 6 Reporting @ 3 Splunk @ 5 Agile @ 3 Cloud Computing @ 7 GPU @ 3

Details

NVIDIA builds pioneering computing platforms used by scientists, researchers, and engineers. The role focuses on managing GPU capacity and compute resources, developing analytics and automation for infrastructure governance, and partnering with cross-functional teams to align capacity with business goals.

Responsibilities

Manage and optimize GPU capacity and other compute resources across various cloud service providers to meet demand and ensure efficient utilization.
Build, develop, and maintain data models, reporting systems, data automation systems, dashboards, and performance metrics to support infrastructure governance and strategic capacity decisions.
Analyze technical and business needs for GPU capacity and compute resources from internal and external teams.
Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with infrastructure teams to resolve them.
Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams.
Develop and enhance tooling for cloud infrastructure and analytics platforms to optimize resource usage and performance, including automation and potentially applying AI techniques to extract signals and insights from generated data.
Partner with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop Infrastructure and Service Level KPIs to match customer satisfaction.
Lead multi-year, budget-based compute resource planning with engineering.

Requirements

Bachelor’s or Master’s degree in Computer Science, Software Engineering, or related field, or equivalent experience.
12+ years of overall experience in cloud computing, specifically managing or sourcing GPU capacity with cloud service providers; experience with large-scale computing operations and planning.
Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets.
Deep understanding of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies.
Experience with cloud service providers such as AWS, Azure, GCP, and OCI (required).
Demonstrated experience leveraging AI tools and techniques to extract useful signals and insights from data to improve resource usage and automation.
Strong understanding and practical application of statistical modeling and machine learning methodologies to improve operational efficiency and inform capacity decisions.
Proficiency with data analytics, visualization, and monitoring tools such as Kibana, Grafana, Splunk, Prometheus, Tableau, and Plotly.
Excellent communication and interpersonal skills; ability to collaborate across departments and influence strategic decisions.
Ability to operate effectively amid uncertainty and rapidly changing business conditions with an agile mindset and commitment to continuous improvement.

Benefits

Base salary range: 200,000 USD - 322,000 USD (final base salary determined by location, experience, and internal pay equity).
Eligible for equity and company benefits (see NVIDIA benefits page).
Role flagged as #LI-Hybrid.

Additional details

Location: Santa Clara, CA, United States.
Employment type: Full time.
Application deadline: Applications accepted at least until September 9, 2025.
NVIDIA is an equal opportunity employer committed to a diverse work environment.