Principal Software Engineer, E2E Performance and Goodput — CSP Engagements

at Nvidia
USD 272,000-431,200 per year
SENIOR
✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Python @ 7 Leadership @ 4 Communication @ 4 Data Analysis @ 7 LLM @ 4 Pandas @ 7 CUDA @ 4 GPU @ 4 AI @ 4 Profiling @ 4 vLLM @ 4 NCCL @ 4 TensorRT @ 4 SGLang @ 4 HPC @ 8 Performance Analysis @ 4 NVLink @ 3

Details

We are looking for a Principal Engineer to join the CSP Engagements team as the technical focal point for end-to-end performance. You will work directly with engineering teams of key cloud service provider (CSP) / hyperscale customers to ensure they achieve performance targets on NVIDIA platforms. You will augment NVIDIA's performance and benchmark teams with a dedicated CSP-facing focus, drive work streams with CSP engineering teams, gather workload-specific feedback to influence NVIDIA optimization priorities, and validate performance targets in customer-representative configurations. Your cross-CSP visibility will help identify patterns and drive systemic improvements in documentation, configuration guidance, and tooling.

Responsibilities

  • Drive performance characterization work streams with engineering teams of key CSP/hyperscale customers — ensure they understand platform performance expectations, profiling methodology, and tuning options for their workloads.
  • Gather and synthesize CSP performance feedback — identify gaps between expected and actual throughput and champion optimization priorities back into NVIDIA's CUDA, NCCL, driver, and firmware teams.
  • Ensure key open-source performance and stress tools (e.g., STREAM, GPU Burn, GPU BLAST) are updated and validated for the latest NVIDIA rack-scale systems, GPU architectures, and CPU platforms so customers and internal teams have reliable baseline measurements.
  • Work closely with CSPs to ensure their performance and validation tooling reflects the latest GPU capabilities, memory hierarchy changes, and platform-specific tuning parameters.
  • Conduct cross-CSP performance comparison and pattern analysis — identify configuration, software, or workload differences that explain performance gaps between deployments.
  • Collaborate with CSPs to ensure performance-related integration work (profiling infrastructure, benchmark harnesses, config validation) is ready ahead of deployment milestones.
  • Define test strategies and tooling requirements for performance validation for both NVIDIA internal certification and customer acceptance.

Requirements

  • 15+ years of experience in systems performance engineering, ideally in GPU/HPC/ML infrastructure. BS or MS in Computer Science, Computer Engineering, or related field (or equivalent experience).
  • Proficiency in GPU workload profiling: nsight systems, nsight compute, DCGM metrics, or equivalent instrumentation.
  • Understanding of distributed training performance dynamics: computation/communication overlap, pipeline bubbles, memory bandwidth utilization, collective efficiency.
  • Knowledge of how the full software stack impacts performance: driver overhead, collective algorithm selection, memory allocation, scheduling, firmware power management.
  • Statistical methods for performance analysis: regression detection, confidence intervals, A/B comparison at scale.
  • Strong data analysis and visualization skills (Python, pandas, dashboards).
  • Ability to communicate performance findings to both deep technical audiences and executive leadership.
  • Demonstrated success influencing multiple engineering teams to prioritize performance improvements.

Ways to stand out from the crowd

  • Experience profiling and optimizing distributed training at 1000+ GPU scale (Megatron-LM, DeepSpeed, FSDP).
  • Background in ML infrastructure performance at a CSP/hyperscaler.
  • Familiarity with NVIDIA platforms (DGX, HGX, NVLink topology) and profiling tools.
  • Experience building automated performance regression detection systems for production environments.
  • Understanding of inference workload performance dynamics (vLLM, TensorRT-LLM, SGLang, continuous batching).

Compensation & Benefits

  • Base salary range: 272,000 USD - 431,250 USD.
  • Eligible for equity and benefits (link to NVIDIA benefits referenced in the posting).

Additional information

  • Applications for this job will be accepted at least until June 30, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and committed to fostering an inclusive work environment.