Senior Solution Engineer, AI Factory Triage

at Nvidia
USD 136,000-264,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 4 Python @ 6 Hiring @ 4 Communication @ 4 Parallel Programming @ 4 Customer Support @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is hiring a Solution Engineer to triage hardware platform issues and AI/ML workloads on GPU-accelerated rack-scale platforms (including GB200) in large datacenters. The role combines direct customer support, deep technical troubleshooting, and hands-on software/tool development to resolve issues and improve NVIDIA products and support tooling.

Responsibilities

  • Provide direct support to NVIDIA Enterprise customers: reproduce, resolve, or advance customer issues.
  • Triage customer hardware platform issues and AI/ML workloads on multi-GPU and rack-scale platforms.
  • Work with engineering teams to supply logs, reproduction steps, and triage information.
  • Create and update product and support tools; develop features and tools as part of solution engineering efforts.
  • Take ownership of customer issues from inception to resolution and document interactions to enhance the knowledge base.
  • Occasionally work on weekends and holidays to support customers.

Requirements

  • Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
  • At least 5+ years of engineering experience with multi-GPU platforms.
  • Strong system software expertise (firmware, BIOS, kernel, drivers, operating systems).
  • Solid understanding of Linux and ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
  • Containerized solutions experience: Docker, Kubernetes, Slurm.
  • Proficient in C/C++ for platform OS, firmware, BIOS, kernel, and drivers.
  • Proficient in Python with the ability to build custom tools.
  • Excellent communication skills and ability to adjust communication to the technical level of the audience.
  • Strong problem-solving, organizational skills, and follow-up discipline.

Ways to stand out

  • Background with parallel programming or GPU acceleration (e.g., CUDA).
  • Experience developing in GPU-accelerated, cloud, or virtualized environments.
  • Experience analyzing software performance of distributed GPU-accelerated workloads.
  • Familiarity with clustering / HPC data center technologies and upper-layer protocols (NCCL, MPI).

Compensation & Benefits

  • Base salary ranges by level: 136,000 USD - 212,750 USD (Level 3); 168,000 USD - 264,500 USD (Level 4).
  • Eligible for equity and company benefits.

Additional information

  • Location: Santa Clara, CA (United States).
  • This is a full-time role.
  • Applications accepted at least until August 5, 2025.
  • NVIDIA is an equal opportunity employer and committed to diversity and inclusion.