Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 7
Algorithms @ 4
Machine Learning @ 4
Leadership @ 4
Communication @ 7
Mathematics @ 7
Parallel Programming @ 7
Technical Leadership @ 4
PyTorch @ 6
CUDA @ 4
GPU @ 4
AI @ 4
HPC @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is seeking a Senior HPC Performance Engineer to join teams building next-generation scientific machine learning (ML) frameworks. The role focuses on accelerating AI for Science across domains such as digital biology, using HPC and ML methods to advance NVIDIA's capabilities.
Responsibilities
- Design and implement computationally performant features for large-scale, CUDA-backed ML training frameworks, using low-level acceleration and scaling strategies such as kernel design, GPU porting, data structure innovations, and distributed learning technologies.
- Optimize computational performance of a wide range of business-critical ML models via accelerated hardware and software stacks, and through algorithmic improvements.
- Develop and maintain HPC software stack for atomistic modeling and generative machine learning in digital biology and other domains.
- Collaborate with multiple HPC, AI infrastructure, and research teams.
- Drive testing and maintenance of algorithms and software modules.
Requirements
- Advanced degree in a quantitative field (Computer Science, Computational Biophysics, Computational Chemistry, Physics, Mathematics) or equivalent experience.
- 5+ years of relevant experience.
- Consistent track record in performance engineering, software design, building and packaging, and launching software products with a focus on acceleration.
- Deep understanding of parallel programming in C++ and Python; programming experience with CUDA or OAI Triton.
- Fluent in modern machine learning frameworks such as PyTorch, JAX, and Warp.
- Experience applying HPC solutions to research problems in biology or chemistry, including atomistic simulations.
- Demonstrated technical leadership, self-direction, ability to teach and learn from others.
- Strong communication skills, organized and self-motivated, and a collaborative team player.
Ways to stand out
- Contributions to major scientific AI for Science codebases with acceleration features such as new kernels.
- Familiarity with pioneering language and geometric models used in AI for Science applications in biology and chemistry.
Compensation and benefits
- Base salary range: 184,000 USD - 287,500 USD (final base depends on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits.
Additional information
- Applications accepted at least until February 21, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and values diversity in its workforce.