Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 6
CUDA @ 6
GPU @ 3
Deep Learning @ 4
AI @ 4
Performance Analysis @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. This role is for senior engineers obsessed with performance analysis and optimization to help squeeze every last clock cycle out of AI training across hardware and software stacks.
Responsibilities
- Understand, analyze, profile, and optimize AI training workloads on new hardware and software platforms, identifying fundamental performance limiters.
- Prioritize and solve performance issues across key AI model training tasks, pushing end-to-end performance toward physical limits.
- Implement production-quality software across multiple layers of NVIDIA's deep learning platform stack, from drivers to deep learning frameworks.
- Build and support NVIDIA submissions for MLPerf Training benchmarks.
- Implement key deep learning training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
- Develop tools to automate workload analysis, optimization, and other critical workflows.
Requirements
- PhD in CS, EE or CSEE (or equivalent experience) with 5+ years of relevant experience; or MS with 8+ years of experience.
- Strong background in deep learning and neural networks, particularly in training.
- Solid understanding of computer architecture and familiarity with GPU fundamentals.
- Proven background in analyzing and tuning application performance.
- Proven experience with processor and system-level performance modeling.
- Proficiency in programming with C++, Python, and CUDA.
Compensation and Benefits
- NVIDIA offers highly competitive salaries, equity eligibility, and a comprehensive benefits package (see https://www.nvidiabenefits.com/ and https://www.nvidia.com/en-us/benefits/).
- Base salary ranges stated: 184,000 USD - 287,500 USD for Level 4; 224,000 USD - 356,500 USD for Level 5.
Additional Information
- Applications for this job will be accepted at least until February 16, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and does not discriminate on the basis of protected characteristics.