Senior Solution Engineer, System Diagnostics Software
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 6 Communication @ 7 Parallel Programming @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is looking for an engineer who wants the excitement of direct customer interaction, and the reward of contributing to software and products, to join our team of Solution Engineers supporting NVIDIA's GPU accelerated platforms in AI Factories. You will work directly with customers to deliver solutions on NVIDIA platforms including the GB200. The role focuses on triaging customer hardware platform issues and AI/ML workloads in large datacenter rack-scale platforms, solving customer problems, and contributing to products and software tooling. Excellent problem-solving, communication skills, the ability to work on multiple projects, strong Linux knowledge, solid programming skills, and experience with multi-GPU platforms are required. Expertise analyzing performance of distributed GPU-accelerated workloads is a plus.
Responsibilities
- Provide direct support to NVIDIA Enterprise customers: answer questions, reproduce, resolve, or advance customer issues.
- Work with engineering teams on customer issues, providing logs, reproduction information, and other triage information.
- Create and update product and support tools.
- Take ownership and drive customer issues from inception to resolution.
- Document customer interactions and improve the knowledge base.
- Develop features and tools as part of solution engineering efforts to support NVIDIA technologies.
- Occasional work on weekends and holidays to support customers.
Requirements
- Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
- At least 5+ years of engineering experience with multi-GPU platforms.
- Strong system software expertise (firmware, BIOS, kernel, driver, operating system).
- Solid understanding of Linux; ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
- Experience with containerized solutions: Docker, Kubernetes, Slurm.
- Proficient in C/C++ programming for platform OS, firmware, BIOS, kernel, drivers.
- Proficient in Python with the ability to build custom tools.
- Professional-level communication skills; ability to adjust communication to audience and remain calm in difficult situations.
- Excellent follow-up and organizational skills; passion for solving problems.
Ways to stand out
- Background with parallel programming or GPU acceleration (e.g., CUDA).
- Experience developing in GPU-accelerated, cloud, or virtualized environments.
- Experience analyzing software performance of distributed workloads.
- Knowledge of clustering or HPC datacenter technologies, including Upper Layer Protocols (NCCL, MPI).
Compensation & Benefits
- Base salary will be determined based on location, experience, and peer pay. The base salary range provided: 136,000 USD - 212,750 USD for Level 3, and 168,000 USD - 264,500 USD for Level 4.
- You will also be eligible for equity and benefits (see NVIDIA benefits page).
Additional Information
- Location: Santa Clara, CA, United States (customer-facing role).
- Applications accepted at least until September 19, 2025.
- NVIDIA is an equal opportunity employer committed to a diverse work environment.