Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 7
Communication @ 4
Data Analysis @ 7
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
Profiling @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is seeking a Compute Kernel Performance Architect with a unique blend of skills: someone who can write, profile, and analyze CUDA kernels with a laser focus on power consumption and current draw — and who understands how those kernels interact with the GPU's Power Delivery Network (PDN) at a system level. This role involves writing stress workloads that deliberately push GPU power to its limits, partnering with hardware architects to validate power integrity assumptions, and helping ensure chips survive harsh real-world di/dt scenarios. You will work at the boundary of GPU architecture, software and silicon, influencing GPU power architecture for future NVIDIA products.
Responsibilities
- Design and develop CUDA kernels purpose-built to enhance GPU power consumption — targeting worst-case current draw across compute, memory, and I/O subsystems.
- Collaborate with hardware power architects to validate PDN assumptions and di/dt specs to appropriately target weak points.
- Build and maintain a library of power stress microbenchmarks that sweep power profiles across GPU functional units — tensor cores, memory controllers, I/O interfaces — to stress PDN resonance and droop conditions across GPU families.
- Analyze trade-offs between kernel throughput, power efficiency, and voltage stability, contributing insights that feed directly into future GPU architecture decisions.
- Partner across teams — GPU architects, power circuit designers, silicon validation engineers — to ensure power stress methodologies are aligned from pre-silicon simulation through post-silicon bringup.
Requirements
- MS or PhD in Computer Science, Electrical Engineering, or Computer Engineering (or equivalent experience).
- 5+ years of experience in GPU kernel development, CUDA programming, or high-performance computing.
- Strong CUDA and C++ programming skills, with hands-on experience writing and optimizing kernels at the assembly or PTX level.
- Experience with GPU performance profiling tools — Nsight Compute, Nsight Systems, nvprof, or equivalent.
- Solid understanding of GPU architecture — SMs, memory hierarchy, power states, and how they map to current draw profiles.
- Working knowledge of Power Delivery Networks (PDNs) — including board-level PDN design, package inductance, decoupling capacitors, and their role in voltage droop and overshoot.
- Conceptual understanding of di/dt — how rapid current transitions cause voltage transients, and how software workloads can be designed to control or stress those transitions.
- Strong programming skills in Python for scripting, data analysis, and automation of power characterization workflows.
- Excellent communication skills and comfort working across hardware and software disciplines.
Ways to Stand Out
- Hands-on experience writing GPU power stress microbenchmarks — synthetic workloads designed to hit worst-case power consumption on specific GPU functional units.
- Direct experience with post-silicon power characterization — measuring VDD voltage droop, di/dt slew rates, and power supply transient response using oscilloscopes, sensors, or equivalent lab tools.
- Experience with DVFS, AVFS, and noise mitigation features and understanding how they interact with kernel behavior.
- Knowledge of PDN impedance targets across die, package, and board domains, and how resonance frequencies map to observed voltage droop signatures.
Additional information
- The team partners with Compute Architecture, Silicon Solutions, Power Architecture, and Deep Learning framework teams. Work influences both silicon power delivery and the software stack.
- Base salary range: 184,000 USD - 287,500 USD for Level 4; 224,000 USD - 356,500 USD for Level 5.
- You will also be eligible for equity and benefits.
- Applications for this job will be accepted at least until April 11, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.