Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 4
CI/CD @ 4
Machine Learning @ 3
Mathematics @ 4
API @ 4
PyTorch @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
Profiling @ 4
HPC @ 4
JAX @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA BioNeMo is building the computational foundation for the next generation of biological discovery. The cuEquivariance team develops an NVIDIA library that accelerates geometric neural networks on NVIDIA GPUs, enabling researchers in molecular biology, materials science, and physics to train and deploy equivariant models at scale. The team ships production GPU kernels and software interfaces used throughout the scientific field. The role spans CUDA kernel engineering, Python library development (PyTorch and JAX), and close collaboration with research teams and external framework developers.
Responsibilities
- Build, implement, and optimize CUDA kernels for equivariant neural network primitives (tensor products, segmented polynomials, triangle-based operations) targeting peak performance across NVIDIA GPU generations.
- Deliver end-to-end GPU-accelerated geometric ML primitives: implementation, validation, and production-quality software used by external frameworks.
- Build and maintain interfaces for PyTorch and JAX that expose cuEquivariance primitives to application developers and researchers.
- Drive CI/CD infrastructure for multi-GPU kernel builds, automated correctness testing, and performance regression tracking.
- Collaborate with Applied Science and research teams to evaluate new equivariant architectures and translate prototypes into production kernels.
- Engage with third-party framework developers and partners to align on interfaces and ensure integration into production pipelines.
Requirements
- 6+ years of software engineering experience with a strong background in CUDA and GPU programming.
- Deep proficiency in C++ and Python; experience building and shipping production libraries used by external developers.
- Solid foundation in GPU computing: memory hierarchy, warp-level execution, occupancy, and performance profiling methodology.
- Experience contributing to or building production scientific software libraries, ML frameworks, or developer-facing GPU APIs.
- Familiarity with concepts in geometric machine learning (equivariance, group representations, irreducible representations, tensor products) sufficient to work efficiently in the domain.
- BS/MS in Computer Science, Physics, Applied Mathematics, or a related field, or equivalent experience.
Ways to Stand Out
- Contributions to or deep usage of equivariant neural network frameworks (e3nn, MACE, NequIP, SE(3)-Transformers, or similar).
- Hands-on experience with Triton kernel development or other GPU kernel authoring tools alongside CUDA.
- Experience with mixed-precision or tensor-core-aware algorithm design for scientific or ML workloads.
- PhD or equivalent experience in computational chemistry, biophysics, physics, or computer science with a focus on geometric deep learning or HPC.
- Contributions to open-source geometric ML or GPU computing projects.
Compensation & Benefits
- Base salary ranges (location and level dependent):
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and benefits (see www.nvidiabenefits.com).
Additional Information
- Applications accepted at least until May 26, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.