Senior Solution Engineer, AI Factory Triage

at Nvidia

📍 Santa Clara, United States

USD 136,000-264,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Docker @ 4 Kubernetes @ 4 Linux @ 4 Python @ 6 Hiring @ 4 Communication @ 4 Parallel Programming @ 4 Customer Support @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is hiring a Solution Engineer to triage hardware platform issues and AI/ML workloads on GPU-accelerated rack-scale platforms (including GB200) in large datacenters. The role combines direct customer support, deep technical troubleshooting, and hands-on software/tool development to resolve issues and improve NVIDIA products and support tooling.

Responsibilities

Provide direct support to NVIDIA Enterprise customers: reproduce, resolve, or advance customer issues.
Triage customer hardware platform issues and AI/ML workloads on multi-GPU and rack-scale platforms.
Work with engineering teams to supply logs, reproduction steps, and triage information.
Create and update product and support tools; develop features and tools as part of solution engineering efforts.
Take ownership of customer issues from inception to resolution and document interactions to enhance the knowledge base.
Occasionally work on weekends and holidays to support customers.

Requirements

Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
At least 5+ years of engineering experience with multi-GPU platforms.
Strong system software expertise (firmware, BIOS, kernel, drivers, operating systems).
Solid understanding of Linux and ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
Containerized solutions experience: Docker, Kubernetes, Slurm.
Proficient in C/C++ for platform OS, firmware, BIOS, kernel, and drivers.
Proficient in Python with the ability to build custom tools.
Excellent communication skills and ability to adjust communication to the technical level of the audience.
Strong problem-solving, organizational skills, and follow-up discipline.

Ways to stand out

Background with parallel programming or GPU acceleration (e.g., CUDA).
Experience developing in GPU-accelerated, cloud, or virtualized environments.
Experience analyzing software performance of distributed GPU-accelerated workloads.
Familiarity with clustering / HPC data center technologies and upper-layer protocols (NCCL, MPI).

Compensation & Benefits

Base salary ranges by level: 136,000 USD - 212,750 USD (Level 3); 168,000 USD - 264,500 USD (Level 4).
Eligible for equity and company benefits.

Additional information

Location: Santa Clara, CA (United States).
This is a full-time role.
Applications accepted at least until August 5, 2025.
NVIDIA is an equal opportunity employer and committed to diversity and inclusion.