Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 4 Docker @ 4 Linux @ 4 Python @ 4 TensorFlow @ 4 Bash @ 4 Networking @ 4 Performance Monitoring @ 4 System Architecture @ 7 PyTorch @ 4 CUDA @ 4 GPU @ 4Details
As NVIDIA makes inroads into the Datacenter business, this team is focused on maximizing performance and power efficiency of deep learning applications on datacenter-class hardware and establishing data-driven approaches to hardware design and system software development.
Responsibilities
- Develop software infrastructure to characterize and analyze a broad range of Deep Learning applications.
- Evolve cost-efficient datacenter architectures tailored to meet the needs of Large Language Models (LLMs).
- Work with experts to develop analysis and profiling tools in Python, bash and C++ to measure key performance metrics of DL workloads running on NVIDIA systems.
- Analyze system and software characteristics of DL applications (CPU, GPU, networking, IO interactions with DL workloads).
- Develop analysis tools and methodologies to measure key performance metrics and estimate potential for efficiency improvement.
Requirements
- Bachelor’s degree in Electrical Engineering or Computer Science or equivalent experience (Master's or PhD preferred).
- 8 years or more of relevant experience.
- Experience in at least one of:
- System Software: Operating Systems (Linux), Compilers, GPU kernels (CUDA), DL Frameworks (PyTorch, TensorFlow).
- Silicon Architecture and Performance Modeling/Analysis: CPU, GPU, Memory or Network Architecture.
- Programming experience in C/C++ and Python. Exposure to bash scripting.
- Exposure to containerization platforms (docker) and datacenter workload managers (slurm) is a plus.
- Deep understanding of computer system architecture and performance analysis with demonstrated hands-on experience.
- Demonstrated ability to work in virtual/multi-site environments and to take ownership from start to finish.
Ways to stand out
- Background with system software, OS intrinsics, GPU kernels (CUDA), or DL frameworks (PyTorch, TensorFlow).
- Experience with silicon performance monitoring or profiling tools (e.g., perf, gprof, nvidia-smi, dcgm).
- In-depth performance modeling experience in CPU, GPU, Memory or Network Architecture.
- Exposure to containerization platforms (docker) and datacenter workload managers (slurm).
- Prior experience working with multi-site or cross-functional teams.
Compensation & Additional Info
- Base salary ranges (determined by location, experience, and comparable roles):
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and benefits.
- Applications accepted at least until July 29, 2025.
- #LI-Hybrid
Benefits
- NVIDIA benefits (details available on NVIDIA website).