Senior System Software Engineer, AI Infrastructure
at Nvidia
π Santa Clara, United States
USD 120,000-235,800 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Marketing @ 4 Software Development @ 6 Kubernetes @ 4 Python @ 7 Communication @ 4 Networking @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA platforms power generative AI, autonomous driving, industrial robots, medical instruments and data centers worldwide. As a Senior System Software Engineer focused on AI Infrastructure, you will evaluate and optimize NVIDIA hardware and software stacks, collaborate across engineering, product, and marketing teams, and improve developer experience for GPU-accelerated AI platforms, SDKs, libraries, and frameworks.
Responsibilities
- Run multi-node training and inference jobs on large GPU clusters to assess performance, validate usability, and improve products.
- Design benchmark suites that highlight NVIDIA hardware, networking, and software stacks.
- Profile deep-learning workloads, identify bottlenecks, and deliver optimization guidance.
- Produce concise tutorials, scripts, whitepapers, and developer education artifacts.
- Analyze competitive solutions and craft data-driven product positioning.
- Present live demos at conferences such as GTC, CES, and SIGGRAPH.
Requirements
- 3+ years in software development, tech marketing, evangelism, or similar roles.
- BS/MS in Computer Science, Computer Engineering, Electrical Engineering, or a related field (or equivalent experience).
- Strong Python and C++ skills for AI and HPC workloads.
- Hands-on multi-node experience with Slurm, Kubernetes, or cloud CSP clusters.
- Solid grasp of deep-learning architectures, PyTorch, and distributed training methods.
- Understanding of CPU/GPU architecture and experience with CUDA, cuDNN, TensorRT-LLM, Triton, and NCCL.
- Experience profiling and optimizing DL workloads for performance on GPU clusters.
- Excellent written and verbal communication skills for technical and executive audiences.
Ways to stand out
- Hands-on experience setting up and tuning HPC clusters with Slurm, Kubernetes, or other schedulers.
- Public technical blogs, talks, forum activity, or notable open-source projects; prior work with customers or technical press on AI performance topics.
- Exceptional ability to simplify complex technology for diverse audiences.
- Familiarity with modern LLM architectures and the ability to write Torch code and occasional custom GPU kernels.
- Expertise in InfiniBand, NVLink, RoCE, RDMA, and collective-communication libraries.
Compensation & Benefits
- Base salary range (location and level dependent):
- Level 2: 120,000 USD - 189,750 USD
- Level 3: 148,000 USD - 235,750 USD
- Eligible for equity and benefits (see NVIDIA benefits page).
Other details
- Applications accepted at least until August 18, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.