Used Tools & Technologies
HPCRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
Algorithms @ 4
Machine Learning @ 7
LLM @ 3
PyTorch @ 4
GPU @ 7
Deep Learning @ 4
AI @ 4
Profiling @ 4
TensorRT @ 4
Performance Analysis @ 4
JAX @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are now seeking a Senior Deep Learning Performance Architect! NVIDIA is looking for outstanding Performance Architects with a background in performance analysis, performance modeling, and AI/deep learning to help analyze and develop the next generation of architectures that accelerate AI and high-performance computing applications.
Responsibilities
- Develop innovative architectures to extend the state of the art in deep learning performance and efficiency.
- Analyze performance, cost and power trade-offs by developing analytical models, simulators and test suites.
- Understand and analyze the interplay of hardware and software architectures on future algorithms, programming models and applications.
- Evaluate PPA (performance, power, area) for hardware features and system level architectural trade-offs. Develop high level simulators in C++/Python.
- Actively collaborate with software, product and research teams to guide the direction of deep learning HW and SW.
Requirements
- MS or PhD in Computer Science, Computer Engineering, Electrical Engineering or equivalent experience.
- 6+ years of relevant meaningful work experience.
- Strong background in GPU or Deep Learning ASIC architecture for distributed training and/or inference spanning multi-chip/multi-node.
- Experience with performance modeling, architecture simulation, profiling, and analysis.
- Solid foundation in machine learning and deep learning, with understanding of modern transformer-based architectures and their performance at scale.
- Strong programming skills in Python, C, C++.
Ways to stand out
- Background with deep neural network training, inference and optimization in leading frameworks (e.g. PyTorch, JAX, TensorRT).
- Familiarity with advanced optimizations and SW/HW co-design in LLM training and inference.
- Exposure to using AI to accelerate software engineering.
- Demonstration of self-motivation and creative / critical thinking.
Compensation and benefits
- Base salary range (determined by location, experience, and pay of employees in similar positions):
- Level 4: 184,000 USD - 287,500 USD per year
- Level 5: 224,000 USD - 356,500 USD per year
- You will also be eligible for equity and benefits (see company benefits page).
Additional information
- Applications for this job will be accepted at least until June 7, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and is committed to fostering an inclusive work environment.