Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Linux @ 4 Python @ 6 Hiring @ 4 Communication @ 4 Parallel Programming @ 4 Customer Support @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is hiring a Solution Engineer to triage hardware platform issues and AI/ML workloads on GPU-accelerated rack-scale platforms (including GB200) in large datacenters. The role combines direct customer support, deep technical troubleshooting, and hands-on software/tool development to resolve issues and improve NVIDIA products and support tooling.
Responsibilities
- Provide direct support to NVIDIA Enterprise customers: reproduce, resolve, or advance customer issues.
- Triage customer hardware platform issues and AI/ML workloads on multi-GPU and rack-scale platforms.
- Work with engineering teams to supply logs, reproduction steps, and triage information.
- Create and update product and support tools; develop features and tools as part of solution engineering efforts.
- Take ownership of customer issues from inception to resolution and document interactions to enhance the knowledge base.
- Occasionally work on weekends and holidays to support customers.
Requirements
- Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
- At least 5+ years of engineering experience with multi-GPU platforms.
- Strong system software expertise (firmware, BIOS, kernel, drivers, operating systems).
- Solid understanding of Linux and ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
- Containerized solutions experience: Docker, Kubernetes, Slurm.
- Proficient in C/C++ for platform OS, firmware, BIOS, kernel, and drivers.
- Proficient in Python with the ability to build custom tools.
- Excellent communication skills and ability to adjust communication to the technical level of the audience.
- Strong problem-solving, organizational skills, and follow-up discipline.
Ways to stand out
- Background with parallel programming or GPU acceleration (e.g., CUDA).
- Experience developing in GPU-accelerated, cloud, or virtualized environments.
- Experience analyzing software performance of distributed GPU-accelerated workloads.
- Familiarity with clustering / HPC data center technologies and upper-layer protocols (NCCL, MPI).
Compensation & Benefits
- Base salary ranges by level: 136,000 USD - 212,750 USD (Level 3); 168,000 USD - 264,500 USD (Level 4).
- Eligible for equity and company benefits.
Additional information
- Location: Santa Clara, CA (United States).
- This is a full-time role.
- Applications accepted at least until August 5, 2025.
- NVIDIA is an equal opportunity employer and committed to diversity and inclusion.