Senior Software Engineer - Parallel Computing Systems
at Nvidia
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Algorithms @ 4 Distributed Systems @ 4 Communication @ 4 Parallel Programming @ 4 Performance Optimization @ 4 PyTorch @ 4 CUDA @ 7 GPU @ 4Details
Join NVIDIA's nvFuser team to build the next-generation fusion compiler that automatically optimizes deep learning models for workloads scaling to thousands of GPUs. This role focuses on compiler technology, systems-level performance, and parallel programming to improve GPU performance for AI workloads.
Responsibilities
- Design algorithms that generate highly optimized code from deep learning programs.
- Build GPU-aware CPU runtime systems that coordinate kernel execution for maximum performance.
- Debug and identify performance bottlenecks in large-scale (thousand-GPU) distributed systems.
- Collaborate with hardware architects, framework maintainers (including the PyTorch Core team), and optimization experts to develop systematic compiler optimizations from manual techniques.
- Influence next-generation hardware design through performance-driven compiler work.
Requirements
- MS or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
- 4+ years of advanced C++ programming with large codebase development, template metaprogramming, and performance-critical code.
- Strong parallel programming experience with technologies such as multi-threading, OpenMP, CUDA, MPI, NCCL, NVSHMEM, or other parallel computing frameworks.
- Demonstrated experience with low-level performance optimization and systematic bottleneck identification beyond basic profiling.
- Performance analysis skills: experience analyzing high-level programs to find performance bottlenecks and develop optimization strategies.
- Collaborative problem-solving approach, adaptability in ambiguous situations, first-principles thinking, and strong ownership.
- Excellent verbal and written communication skills.
Ways to stand out
- Experience with HPC / scientific computing: CUDA optimization, GPU programming, numerical libraries (cuBLAS, NCCL), or distributed computing.
- Compiler engineering background: LLVM, GCC, domain-specific language design, program analysis, or IR transformations and optimization passes.
- Deep technical foundation in CPU/GPU architectures, numeric libraries, modular software design, or runtime systems.
- Experience with large software projects, advanced performance profiling, and a demonstrated track record of rapid learning.
- Expertise with distributed parallelism techniques, tensor operations, auto-tuning, or performance modeling.
Compensation & Benefits
- Base salary range (dependent on location, experience, and level):
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and benefits (see NVIDIA benefits).
Other details
- Location: Santa Clara, CA, United States (see location field).
- Employment type: Full time.
- Applications accepted at least until July 29, 2025.
- NVIDIA is an equal opportunity employer and values diversity in its workforce.