Senior Software Engineer - Parallel Computing Systems

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Algorithms @ 4 Distributed Systems @ 4 Communication @ 4 Parallel Programming @ 7 Performance Optimization @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

Do you have expertise in CUDA kernel optimization, C++ systems programming, or compiler infrastructure? Join NVIDIA's nvFuser team to build the next-generation fusion compiler that automatically optimizes deep learning models for workloads scaling to thousands of GPUs. This role sits at the intersection of compiler technology and high-performance computing; you will collaborate with the PyTorch Core team and Lightning-AI/Thunder to accelerate PyTorch workloads, and work with hardware architects, framework maintainers, and optimization experts to create compiler infrastructure that advances GPU performance.

Responsibilities

Design algorithms that generate highly optimized code from deep learning programs.
Build GPU-aware CPU runtime systems that coordinate kernel execution for maximum performance.
Debug and remove performance bottlenecks in distributed systems that can scale to thousands of GPUs.
Collaborate with NVIDIA hardware engineers and optimization specialists to develop automated compiler optimizations from manual techniques.
Influence next-generation hardware design through performance analysis and close collaboration with architecture teams.

Requirements

MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field (or equivalent experience).
4+ years of advanced C++ programming experience with large codebase development, template metaprogramming, and performance-critical code.
Strong parallel programming experience (multi-threading, OpenMP, CUDA, MPI, NCCL, NVSHMEM, or other parallel computing technologies).
Demonstrated experience with low-level performance optimization and systematic bottleneck identification (beyond basic profiling).
Performance analysis skills: experience analyzing high-level programs to identify performance bottlenecks and develop optimization strategies.
Collaborative problem-solving approach, adaptability in ambiguous situations, first-principles thinking, and ownership.
Excellent verbal and written communication skills.

Ways to stand out

Experience with HPC/scientific computing: CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing.
Compiler engineering background: LLVM, GCC, domain-specific language design, program analysis, or IR transformations and optimization passes.
Deep technical foundation in CPU/GPU architectures, numerical libraries, modular software design, or runtime systems.
Experience with large software projects, performance profiling, and a track record of rapid learning.
Expertise with distributed parallelism techniques, tensor operations, auto-tuning, or performance modeling.

Compensation & Application

Base salary ranges by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
You will also be eligible for equity and benefits.
Applications accepted at least until July 29, 2025.

About NVIDIA

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of any characteristic protected by law.