Senior Solution Engineer, AI Factory Triage
at Nvidia
π Santa Clara, United States
USD 136,000-264,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 6 Communication @ 4 Parallel Programming @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is seeking an engineer who enjoys direct customer interaction and contributing to software and products to join the Solution Engineering team supporting NVIDIAβs GPU-accelerated platforms in AI Factories. You will work directly with customers to triage hardware platform issues and AI/ML workloads on rack-scale, multi-GPU datacenter platforms (including GB200), solve customer problems, and contribute to product and tooling development. Strong Linux expertise, solid programming skills, and experience with multi-GPU platforms are required. Occasional weekend and holiday work to support customers is expected.
Responsibilities
- Provide direct support to NVIDIA Enterprise customers: answer questions, reproduce, resolve, or advance customer issues.
- Work with engineering teams on customer issues, providing logs, reproduction information, and other triage details.
- Create and update product and support tools.
- Take ownership and drive customer issues from inception to resolution.
- Document customer interactions and enhance the knowledge base.
- Develop features and tools as part of solution engineering efforts supporting NVIDIA technologies.
- Occasional work on weekends and holidays to support customers.
Requirements
- Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
- At least 5+ years of engineering experience with multi-GPU platforms.
- Strong system software expertise (firmware, BIOS, kernel, driver, operating system).
- Solid understanding of Linux with the ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
- Experience with containerized solutions: Docker, Kubernetes, Slurm.
- Proficient in C/C++ programming for platform OS, firmware, BIOS, kernel, drivers.
- Proficient in Python with the ability to build custom tools.
- Professional-level communication skills and ability to adjust communication to technical level of the audience and remain calm in difficult situations.
- Excellent follow-up and organizational skills and strong problem-solving orientation.
Ways to stand out
- Background with parallel programming or GPU acceleration (e.g., CUDA).
- Experience developing in GPU-accelerated, cloud, or virtualized environments.
- Experience analyzing software performance of distributed workloads.
- Knowledge of clustering or HPC datacenter technologies and upper-layer protocols (NCCL, MPI).
Compensation & Other Information
- Base salary range:
- Level 3: 136,000 USD - 212,750 USD
- Level 4: 168,000 USD - 264,500 USD
- You will also be eligible for equity and benefits.
- Applications for this job will be accepted at least until August 5, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.