Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
GCP @ 3
Machine Learning @ 6
Technical Proficiency @ 6
AWS @ 3
Azure @ 3
Communication @ 6
IaaS @ 3
Reporting @ 3
Agile @ 3
Cloud Computing @ 6
GPU @ 3
AI @ 3
HPC @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Our technology is limitless! NVIDIA is developing the world’s most innovative and groundbreaking computing platforms. Due to our work, scientists, researchers, and engineers are able to advance their ideas. At its essence, our visual computing technology offers not only an outstanding computing experience but also energy efficiency. We led the way in a supercharged style of computing embraced by the fastest-moving computer users globally—scientists, designers, artists, and gamers.
Responsibilities
- Coordinate the development of High Performance Computing (HPC) clusters, collaborating closely with internal and external engineering teams.
- Direct and improve GPU capacity and additional compute resources across diverse cloud service platforms to satisfy rising needs and secure efficient deployment.
- Design, improve, and manage data models, reporting platforms, data automation solutions, dashboards, and performance measures that back NVIDIA Infrastructure governance programs and strategic capacity decisions.
- Assess the technical and business requirements for GPU capacity and other compute resources from different internal and external groups.
- Identify performance bottlenecks in day-to-day usage of compute resources and collaborate with relevant infrastructure teams to resolve them.
- Drive infrastructure resource efficiency initiatives in partnership with engineering, finance, and product teams.
- Develop and enhance tooling for our cloud infrastructure and analytics platform to optimize resource usage and performance for NVIDIA and its customers, including automating workflows and applying AI techniques to extract signals and insights from generated data.
- Partner and cross-collaborate with Finance, Product, Service Owners, and Infrastructure Engineering teams to align cloud capacity management with company goals and develop infrastructure and service level benchmarks to match customer satisfaction.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field, or equivalent experience.
- 8+ years of overall experience in cloud computing, specifically in managing or using GPU capacity for high performance computing. Experience with large-scale computing operations and planning is a plus.
- Strong technical proficiency in cloud architecture, development and deployment, and managing large data sets.
- Experience with command line interfaces and shell scripting languages.
- Comprehensive knowledge of cloud service models (IaaS, PaaS, SaaS) and cloud infrastructure technologies.
- Practical experience with Cloud Service Providers including AWS, Azure, GCP, and OCI.
- Demonstrated experience using AI tools and techniques to extract useful signals and insights from data to improve resource usage and automation.
- Deep knowledge and active use of statistical modeling and machine learning approaches for boosting operational efficiency and supporting strategic capacity decisions.
- Understanding of analytics, statistical modeling, and machine learning methodologies.
- Strong communication and relationship-building skills, with the ability to work well across different departments and contribute to strategic decisions.
- Self-starter, self-motivated, focused, and self-sufficient, with a willingness to learn new challenges and adapt quickly in a dynamic environment.
- Ability to operate effectively amidst uncertainty and rapidly changing business conditions, with an agile approach and a commitment to ongoing improvement.
Compensation & Benefits
- Base salary ranges (location, experience, and level dependent):
- Level 4: 136,000 USD - 218,500 USD
- Level 5: 176,000 USD - 276,000 USD
- Eligible for equity and company benefits (link to NVIDIA benefits referenced in the posting).
Additional Information
- Applications for this job will be accepted at least until March 24, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. The company does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.