Senior DGX AI Cloud Performance Analysis Tools Engineer
at Nvidia
π Santa Clara, United States
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 4 TensorFlow @ 4 Performance Optimization @ 4 Data Analysis @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4Details
Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and data infrastructure tools and services. The objective is to deliver a stable, scalable environment for AI researchers, providing them with the necessary resources and scale to foster innovation. The role is focused on designing and developing tools for AI application performance analysis to enable AI researchers to work efficiently with a wide variety of DGX Cloud AI systems and to identify opportunities for performance optimization.
Responsibilities
- Develop AI performance tools for large-scale AI systems providing real-time insight into application performance and system bottlenecks.
- Conduct in-depth hardware-software performance studies.
- Define performance and efficiency evaluation methodologies.
- Automate performance data analysis and visualization to convert profiling data into actionable optimizations.
- Support deep learning software engineers and GPU architects in their performance analysis efforts.
- Collaborate with various teams at NVIDIA to incorporate and influence the latest technologies for GPU performance analysis.
Requirements
- Minimum of 8+ years of experience in software infrastructure and tools.
- BS or higher degree in computer science or similar (or equivalent experience).
- Adept programming skills in multiple languages, including C++ and Python.
- Solid foundation in operating systems and computer architecture.
- Outstanding ability to understand users, prioritize among many contending requests, and build consensus.
- Passion for βit just worksβ automation, eliminating repetitive tasks, and enabling team members.
Ways to stand out
- Experience working with large-scale AI clusters.
- Experience with CUDA and GPU computing systems.
- Hands-on experience with deep learning frameworks (TensorFlow, PyTorch, JAX/XLA, etc.).
- Deep understanding of software performance analysis and optimization processes.
Compensation & Benefits
- Base salary ranges by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- You will also be eligible for equity and benefits.
Additional information
- Applications for this job will be accepted at least until July 29, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.