Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
LLM @ 4
PyTorch @ 4
CUDA @ 4
GPU @ 3
Deep Learning @ 4
AI @ 4
Performance Analysis @ 4
JAX @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are now looking for a Senior High-Performance LLM Training Engineer!
NVIDIA is seeking experienced engineers specializing in performance analysis and optimization to improve the efficiency of LLM training workloads, which are shaping the world's most advanced computing systems. This position focuses on optimizing NVIDIA’s high-performance LLM software stack in frameworks like PyTorch and JAX for high-performance training on thousands of GPUs, while also helping shape hardware roadmaps for the next generation of GPUs powering the AI revolution.
Responsibilities
- Understand, analyze, profile, and optimize AI training workloads on innovative hardware and software platforms.
- Understand the big picture of training performance on GPUs, prioritizing and then solving problems across all state-of-the-art neural networks.
- Implement production-quality software in multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
- Build and support NVIDIA submissions to the MLPerf Training benchmark suite.
- Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
- Build tools to automate workload analysis, workload optimization, and other critical workflows.
Requirements
- PhD in Computer Science, Electrical Engineering or Computer Engineering and 5+ years; or MS (or equivalent experience) and 8+ years of meaningful work experience.
- Strong background in deep learning and neural networks, in particular training.
- A deep background in computer architecture and familiarity with the fundamentals of GPU architecture.
- Proven experience analyzing and tuning application performance & processor and system-level performance modelling.
- Programming skills in C++, Python, and CUDA.
Compensation and Benefits
- The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
- You will also be eligible for equity and benefits.
Other details
- Applications for this job will be accepted at least until April 12, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer.