Senior Solution Engineer, AI Factory Triage

at Nvidia

📍 Santa Clara, United States

USD 136,000-264,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 6 Communication @ 4 Parallel Programming @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is seeking an engineer who enjoys direct customer interaction and contributing to software and products to join the Solution Engineering team supporting NVIDIA’s GPU-accelerated platforms in AI Factories. You will work directly with customers to triage hardware platform issues and AI/ML workloads on rack-scale, multi-GPU datacenter platforms (including GB200), solve customer problems, and contribute to product and tooling development. Strong Linux expertise, solid programming skills, and experience with multi-GPU platforms are required. Occasional weekend and holiday work to support customers is expected.

Responsibilities

Provide direct support to NVIDIA Enterprise customers: answer questions, reproduce, resolve, or advance customer issues.
Work with engineering teams on customer issues, providing logs, reproduction information, and other triage details.
Create and update product and support tools.
Take ownership and drive customer issues from inception to resolution.
Document customer interactions and enhance the knowledge base.
Develop features and tools as part of solution engineering efforts supporting NVIDIA technologies.
Occasional work on weekends and holidays to support customers.

Requirements

Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
At least 5+ years of engineering experience with multi-GPU platforms.
Strong system software expertise (firmware, BIOS, kernel, driver, operating system).
Solid understanding of Linux with the ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
Experience with containerized solutions: Docker, Kubernetes, Slurm.
Proficient in C/C++ programming for platform OS, firmware, BIOS, kernel, drivers.
Proficient in Python with the ability to build custom tools.
Professional-level communication skills and ability to adjust communication to technical level of the audience and remain calm in difficult situations.
Excellent follow-up and organizational skills and strong problem-solving orientation.

Ways to stand out

Background with parallel programming or GPU acceleration (e.g., CUDA).
Experience developing in GPU-accelerated, cloud, or virtualized environments.
Experience analyzing software performance of distributed workloads.
Knowledge of clustering or HPC datacenter technologies and upper-layer protocols (NCCL, MPI).

Compensation & Other Information

Base salary range:
- Level 3: 136,000 USD - 212,750 USD
- Level 4: 168,000 USD - 264,500 USD
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until August 5, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.