Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Grafana @ 5
Prometheus @ 5
Kibana @ 5
Tableau @ 5
GCP @ 3
Machine Learning @ 6
Technical Proficiency @ 6
AWS @ 3
Azure @ 3
IaaS @ 6
Reporting @ 3
Splunk @ 5
Agile @ 3
Cloud Computing @ 7
GPU @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Our technology has no boundaries! NVIDIA is building the world’s most groundbreaking and pioneering computing platforms. Because of our work, scientists, researchers, and engineers can advance their ideas. At its core, our visual computing technology not only enables an outstanding computing experience but it is also energy efficient. We pioneered a supercharged form of computing loved by the most fast-paced computer users in the world — scientists, designers, artists, and gamers.
Responsibilities
- Manage and optimize GPU capacity and other compute resources across various cloud service providers to meet growing demands and ensure efficient utilization.
- Build, develop, and maintain data models, reporting systems, data automation systems, dashboards, and performance metrics that support NVIDIA Infrastructure governance programs and strategic capacity decisions.
- Analyze the technical and business needs for GPU capacity and other compute resources from various internal and external teams.
- Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with relevant infrastructure teams to resolve them.
- Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams.
- Develop and enhance tooling for cloud infrastructure and analytics platforms to optimize resource usage and performance, including automating workflows and potentially leveraging AI techniques to extract useful signals and insights from generated data.
- Partner and cross-collaborate with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop Infrastructure and Service Level KPIs to match customer satisfaction.
- Lead multi-year budget-based compute resource planning with engineering.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field (or equivalent experience).
- 10+ years of overall experience in cloud computing, specifically in managing or sourcing GPU capacity with cloud service providers. A proven track record of large-scale computing operations and planning is a plus.
- Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets.
- Deep understanding of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies. Experience with Cloud Service Providers such as AWS, Azure, GCP, and OCI is required.
- Demonstrated experience in employing AI tools and techniques to extract useful signals and insights from data to improve resource usage and automation.
- Strong understanding and practical application of statistical modeling and machine learning methodologies for improving operational efficiency and informing strategic capacity decisions.
- Proficiency with data analytics, visualization, and monitoring tools such as Kibana, Grafana, Splunk, Prometheus, Tableau, and Plotly.
- Ability to operate effectively amidst uncertainty and rapidly changing business conditions, with an agile mindset and a commitment to ongoing improvement.
Benefits
- Base salary range: 168,000 USD - 270,250 USD (final base salary determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and company benefits (see NVIDIA benefits pages).
Additional information
- Applications accepted at least until June 1, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and committed to fostering an inclusive work environment.